A good news for AMD/ATI Graphic cards owners

Lowest series cards that work is 5xxx-series, at least according to DingTo…

Not only compilation time, but also memory usage. With “SVM” enabled (I don’t know what it stands for) it’s quite low. With “EMISSION” there’s a significant bump in memory usage (not to mention it fails to render, though). By the way, removing “SVM” too lowers the kernel compilation time to 2.15 seconds, but it results in a clay render. By commenting out other features it is possible to reduce this time further but either glitches, a black screen or a program crash can start occurring.

My uneducated opinion (I’m not an expert in this area) is that since Cycles is made primarily around CUDA and NVidia limitations, it is causing problems on AMD cards even though the language set is restricted to make it work on OpenCL. That it works in OpenCL on NVidia cards therefore would not be a guarantee that it does on AMD ones as well. Besides, something broke between version 2.62 and 2.63, and no NVidia user noticed as far as I know.

I’m curious to know what hardware Brecht is working on, by the way.
Maybe, after Mango is completed, somebody should send him a Radeon 7xxx.

I tried rendering several cars on my 1 GB video card (Sapphire HD7770 1GB) with Blender 2.62 64 bit official and 12.5beta Catalyst drivers:

http://i.imgur.com/RPMX7l.jpg

17 cars + 1 I partially duplicated by mistake, at 2048x2048 resolution
6:05 minutes until the render started (due to BVH and hidden kernel compilation)
2:20 minutes (approximately) for 15 passes (the render was stopped a bit later)

The video card wasn’t working at 100% due to “swapping” between system memory and VRAM, but overall it was still way faster than using the CPU alone (I have a 3.16 GHz dual core Intel Core 2 Duo E8500).

I rendered the scene with its default settings. No rust to be seen:

http://i.imgur.com/zl9JFl.jpg

The actual rendering process started at minute 3:15. It looks like the kernel does get recompiled silently each time.
I used Blender 2.62 win64 official (the only one which partially works with AMD cards. But please note preprocessing memory usage ).

s12a, the render times and compiling times you get baffle me… Especially the compile times…

Can you check something ? While you render something, at first the kernel is compiled (for very first render), and then it seems that it gets re-compiled just before showing frame 1.

During all that process, are your 2 cores working or just one ?

Another question: what about the viewport: here when the kernel has been compiled once, direct rendering ! I don’t get the “silent re-compile stuff” i get for the actual F12 renders. What about you ?

Farmfield,

I did test it out… And put the results in the other thread :smiley: I’m obtaining the same results than s12a.

But something that is really making me angry right now is that with latest Blender release the 2.63a, the kernel again fails to compile !

Another question: what about the viewport: here when the kernel has been compiled once, direct rendering ! I don’t get the “silent re-compile stuff” i get for the actual F12 renders. What about you ?
Direct rendering here, works as it should.

Unfortunately, only one. CPU usage stays at about 50% during the entire process.

Another question: what about the viewport: here when the kernel has been compiled once, direct rendering ! I don’t get the “silent re-compile stuff” i get for the actual F12 renders. What about you ?
On my PC I get the ~3 minutes preprocessing delay even when using viewport rendering, unfortunately. So it’s not really usable for trying changes on the fly. Just for navigating around the model.

Aah, 2.62, then I could have told you it wouldn’t work, it renders like than on Nvidia and 2.62 also. You need 2.63 or later (but before a build the BHV enhancements) to render it correctly.

But my mistake, I thought you were running 2.63… :stuck_out_tongue:

Soo… Still playing with 2.62 and the 12.5b.
I have installed the openCL 1.2 beta drivers. The only difference i see is that “apparently”, there is only a “silent kernel build”. Yesterday i had “building kernel” in the console, and after 13 minutes, another 13 minutes of silence before rendering. Today I only got 14 minutes of silence in total.
Still using only one core, and render time for Mike’s car is about 4 minutes and a half, like yesterday.

here:
http://developer.amd.com/sdks/AMDAPPSDK/downloads/Pages/default.aspx

AMD mentions additionnal files to copy somewhere, but i don’t know where ! Where is the OpenCL directory they are talking about ?

Another change:
yesterday, once the kernel had been built once, the viewport was working as it should.
Today it starts to silently compile first.

Grrr…

Method, method… I need method !

My uneducated opinion (I’m not an expert in this area) is that since Cycles is made primarily around CUDA and NVidia limitations, it is causing problems on AMD cards even though the language set is restricted to make it work on OpenCL.

Cycles is designed to work on the common subset that is shared among OpenCL, CUDA and C. So in that sense, it is essentially designed around the limitations of OpenCL, because CUDA offers more features than OpenCL. There isn’t really anything NVIDIA-specific to the code. It just so happens that NVIDIA Fermi is to date the best architecture to run really complex, divergent code on. I also don’t believe that when OpenCL is eventually supported on the 5xxx/6xxx line, that it will perform well. And even the 7970 will likely not outperform a GTX580, just as the GTX680 doesn’t.

@Zalamander

So why is the 7970 so much faster in the SmallLuxGPU tests? (it’s about 3/4 down somewhere)

The 7970 is >2x as fast as the GTX580. Why can’t we expect the same with Cycles if AMD gets their drivers in order?

(though I am uncertain this will ever happen, at all, seeing it’s been like this for 10+ years)

Just wondering, how much RAM does your system have and what OS are you using?
It could be that your pc is using the pagefile due to lack of physical memory. That would slow down the OpenCL kernel building process enormously especially if you don’t have an SSD.

Have also a look here:

I frankly see little reason why a 7970 (and other 7xxx cards to their NVidia Fermi counterparts) should have to be slower (let alone much slower) than a 580.

@s12a: 7 Gb ram, just like you, and 7 64 bits, like you, on a Core 2 Quad ~ like you, with a card ~ like yours :wink:

Stay tuned, I made a nice step forward !!! hehe !
I’ll be back to you in a few minutes. Don’t get excited, I am the master of “I think I am a genius, er… no i am not” moments !

EDIT: Hey guys ! Are you here ?

I thik I just got rid of the systematic compiling kernel times for both viewport and render!
Now if i start 2.62 new, select my viewport to use GPU, hop, there it goes, no compiling times at all : it goes straight from BVH building to reendering. Same for the actual render !

EDIT
: Arf, shit, doesn’t work anymore … Grrr

I was actually waiting for the results!

I just got rid of the systematic compiling kernel times for both viewport and render!
Now if i start 2.62 new, select my viewport to use GPU, hop, there it goes, no compiling times at all : it goes straight from BVH building to reendering. Same for the actual render !
What did you do? Sounds extremely interesting and useful.

Arg…!!! I can’t reproduce it. I hope i am not getting carried away…
I lost track about the steps i took. I has something to do with these additional OpenCL1.2 files…

EDIT: Man… I can’t reproduce it !
Can it be that the kernel is temporary put in a location where it can be accessed without recompiling between different blender sessions ?
Made reusable once and for all…

The fact is that I could go to the viewport and render in GPU mode without any delay before actual frame rendering. It happened a few times. Render times were about twice faster as CPU, as usual, and it went directly down to rendering.

Man that’s getting ridiculous, I can’t remember what i did, apart putting those additionnal libraries in the AMD APP SDK folder…

EDIT2: no. It starts to silently recompile again… I think I am turning nuts…
I let it do its business (13 minutes here). Probably it was just that it had to be built once and then not everytime before renders… which would be nice anyway ! i can’t remember.

Sorry…

When i get to install 12.5beta or 12.4 drivers on a Radeon HD 6xxx (caicos codename) and then check in the catalyst control center, i read that catalyst version is 12.3. Blender GPU is slow as hell and works only in clay mode. Is this due to the card? are 12.4+ drivers bound to newest cards? Anyone?

To make the drivers work and be correctly recognized I had to remove them completely from the system first. This is what I did on my system:

  • Uninstall old drivers through Control Panel > “Programs and Settings”
  • Right click on your desktop > Screen Resolution > Advanced Settings > Adapter (Should show “AMD Radeon HD 6xxx Series”) > Properties > Driver > Uninstall (800x600 will become the only possible resolution)
  • Reboot
  • Install new drivers -> forced reboot
  • Profit!

The “Driver Packaging Version” for 12.5 beta drivers is “8.97-120418a-137336E-ATI”. If you’ve installed them correctly you should not see an entry versioned 12.x

The 7970 is >2x as fast as the GTX580. Why can’t we expect the same with Cycles if AMD gets their drivers in order?

SLG/Luxmark only has a very small amount of fixed, inflexible shaders, however Cycles runs a shader interpreter. It is that interpreter that is choking the compiler, and it is that interpreter that is likely creating memory access patterns that work better on the Fermi architecture. Comparing Cycles with any other GPGPU application is really pointless (unless it is architecturally similar) and you will see from different GPGPU benchmarks that their performance can vary drastically between different architectures. (One extreme example would be bitcoin)

It should put it temporarily in your user temp directory.
Press the “Windows” button on your keyboard, then enter

%temp%

to access it.

EDIT: actually it only puts there empty file which get created each time an opencl cycles render is invoked. Compiled OpenCL kernels should be in this directory:

C:\Users\xxxxx\AppData\Roaming\Blender Foundation\Blender\2.62\cache

Sorry for the new post, but this deserves it!
I managed to make the BMW Mike-Pan test scene render in 2.63 64bit Win7 Official by replacing the clay render 2.63 OpenCL kernel with the working one from 2.62 (I also used the same name). Of course this won’t guarantee that Cycles features introduced in new versions will be rendered, but it’s ok for basic fully shaded GPU renders!

Proof (well, sort, of):

http://i.imgur.com/CRFydl.jpg
(click to enlarge. Link: http://i.imgur.com/CRFyd.jpg)

What’s more, the actual render process started at minute 3:23 (due to hidden kernel compilation), so it was a very quick render!

55 seconds with an AMD Radeon 7770! As it should be expected from benchmarks of its GPU computing performance from other websites!

!!

Thanks to Gwenouille for the inspiration :wink:
(That is, toying with Cycles compiled kernel files!)

Now, if we only could get rid of the initial silent kernel compilation process…