My testing: Cycles on an R9-390x + GTX 650 + i7 5820k on OpenCL

Yes, I got it to work on all three and the render even came out flawlessly with no broken tiles.
Sounds like a dream but there are a few problems, regardless there is massive potential I think, let me explain…

My R9-390x is able to do the BMW2 Benchmark in 54 seconds flat with a tile size that covers the entire thing, this is because it has 8GB of GDDR5 on it.

1.When rendering with other devices as well as the 390x, the tile size has to be smaller which in turn bottlenecks my 390x. The tile size for the CPU must be under 128 otherwise it’s just stupid slow, and the GTX 650 would not do well above a tile size of about 128. This meant that while my 390x would get done with all the tiles it could rather quickly, the last two tiles belonging to the i7 and 650 would make the render take quite a bit longer than if I just used the 390x alone.

  1. I could not tell Blender to just use the R9 390x + 650
    I could not tell Blender to use just the CPU+one of GPUs.
    I thought it would be better to exclude the CPU as it is the slowest tile and needs to be the smallest too, but you can not configure it like this.

  2. Realtime preview rendering does not work well… if at all. You must be able to specify which device the realtime render uses, because when I was using all the devices it did not seem to work/it was really slow.

SUGGESTIONS

  1. Add custom tile size options for each device. If I had this option, I would have made the 390x do around 512, the 650 do around 128, and the CPU around 16. This really would have sped up my overall render time. This effectively removes or reduces the bottlenecking problem.

  2. Let me specify which single device the realtime render uses.

  3. Develop more to reduce crashes and bugs. I didn’t have any issues once I set things up right, but others apparently did.

Hope this helps!
~John

I am looking in to purchasing a 390x so thank you for this benchmark.

Would it be possible to get you to run the benchmark again with the following options?
–390x only
–Spatial splits turned off
– tile size 960 x 540
– compositing / sequencer turned off
– stamp turned on (for render time in the next step, its under the metadata render settings)
– render via python console (open the python console as one of the windows, type in bpy.ops.render.render() , nothing will actually come up on screen but it will render in the background, on CUDA this is actually faster

Cheers

Carlo

Everything is set except for that last thing.

I put
bpy.ops.render.render in the console and nothing seemed to happen, it returns with…

Render active scene

bpy.ops.render.render(animation=False, write_still=False, use_viewport=False, layer="", scene="")

Not sure what’s going on here, never rendered this way before.

needs the () at the end as in


bpy.ops.render.render()

Thanks for this by the way :slight_smile:


No problem, dude!

EDIT:
Oh, and I had a light 3d game running in the background, so imagine a second or two off of that. From my previous testing when running this game in the background it only slows it by a second or two.

LOL, you’ve got to remember too that his was just implemented for AMD and there will be further optimizations for the architecture and openCL.

Seems like i have found my next card :slight_smile:

from features from this https://www.blender.org/manual/render/cycles/features.html table CUDA support 9 from 11 features and amd\openCL support 4 from 11 features and when openCL catch at least CUDA features count not any one know. I think for now better CUDA and conform this http://render.otoy.com/octanebench/results.php?sort_by=avg&singleGPU=1 gtx 970 is best price\speed

Eh, a give it a couple releases, the support will be there since the groundwork has already been done.

From memory the only thing that we use (archviz animation) that isnt on the feature set yet is the support for HDR textures & transparent shadows… Transparent shadows are enabled as an experimental feature so the main thing now is HDR textures.

We are using gtx 580’s/590s which are still faster then the 970…

@John: More details would be nice. Which driver version did you use? Which OS? Which 390X do you have, what are the clocks?
I ran the BMW27 benchmark with my R9 290X with default clocks and don’t get below 1:20. However, I don’t think it has anything to do with 8 GiB memory in this case. My Card don’t even fill 2/4 GiB for this scene. Even overclocking to 1100/1400 MHz (Core/Memory) didn’t bring more speed. I think the main reason would be the driver, as I use the fairly old 14.12 (original omega release) on Windows 7.
I’m very curious how this is going, also when I switch to Windows 10.

I need to get that SSS out of experimental (CUDA). It’s ok to crash if it does, but I’d like to render SSS with lower vram. One option would be to make third gpu mode: experimental_may_crash, where it uses the same amount of memory than norman gpu rendering, but user understands that it may crash at any point.

I’m also eager to see how AMD openCL progresses, and that idea of tile size option per rendering unit is great. Because that way, there could be many computing devices with different speeds, and they would not generate that bad bottle neck, that is does right now. I think this feature should be presented as official feature request.

SSS is one of the reasons the vram usage is so high…

If you want lower vram usage, compile cycles yourself, disabling the features you do not need (volumetrics / hair are both big vram offenders)

@@doublebishop

Well here are my results with some 970s using the settings you posted above except i left the tile size 256x256
2 970s



1 970

1x580 = 1:18… Seems like they are on par with the 970s

Im going to force myself to wait for the switch to 14nm before i upgrade from the 7970… Also i need OpenCL to handle volumetric lighting, HDRI and all those other features that it currently does not. Even in Luxrender OpenCL cant handle volume rendering.

Cycles CUDA barley handles volumes 10 CPU samples = ~ 100 GPU samples

From my test, the new win7/8 drivers after Omega didn’t change performance, only the windows10 one brings huge speedup. For your 290x, the 390x are not only rebranded cards. They have a new voltage controller that really reduce power consumption, thus temperature. 290x cards are fast but tend to overheat and throttle (reduce their frequency) in summer. Monitor it with GPUZ if you want, it may explain why the overclocking doesn’t bring anything.

Wrong. The Hawaii Chip on 290X and 390X is exactly the same. The most obvious change is the new driver (not omega iteration 2, this one is upcoming), which is not yet released for the 200 series. Some of the 300 series cards got new PCBs and coolers from the board manufacturers. The BIOS and microcodes were updated. It’s wrong to say the 300 series reduced the power consumtipn, e.g. the MSI 390X takes up to 400 Watts now.
Of course my card wasn’t throttling. I’d even say this is not gonna happen using cycles as the card is nowhere near its power or temperature limits. Cycles seems unable to utilise the full computing power atm.

clocks are
1125 on the core.
1720 on the memory.
Windows 8.1

Also, I plan to return this 390x despite my positive experience with it to best buy in a few days because the air cooled Fury card come out next week and should be better, that HBM is around 4 times faster or something crazy like that. I might even stretch for the water cooled one.

Overall really like this card, mine had a little bit of capacitor whine sometimes though.

Thanks.
You shouldn’t expect something like that. The bandwidth/speed of Fury X’ HBM is 512 GiB/s, your Memory Bandwidth at 1720 MHz is ~440 GiB/s, so it’s not that huge of an improvement. However, the Fury will offer 56 Compute Units vs. 44 on the 2/390X (3.584 vs 2.816 Shader Processors). I don’t know how much improvement in rendering speed you will get, as the shader units seem to be a bit ‘underemployed’ while running cycles (power consumption is also low). So this is what I’m very curious for and hope, somebody with a Fury X might run the cycles benchmark scene. Also consider, the Fury has half the amount of Memory so you would be limited to less complex scenes.
The whine should come from coils/chokes not caps, btw. You get this while computing? If you talk from games, you could just use the new FRTC.