OpenCL compiler for AMD GPUs (cycles only)

anon71893420 · October 21, 2020, 10:31pm

The closest example for comparison will be the next-gen consoles. They will demonstrate how relatively modest hardware, on paper, can out perform much more powerful hardware. The APU and shared memory will reduce so many inefficiencies due to negotiating the PCIe bus that the consoles will do far more with much less.

Apple will do the exact same thing and make sure all their APIs hit the metal as hard as efficiently as possible. I can see video editing on Apple Silicon being absolutely next level as for the first time the benefits of heterogeneous computing comes to fruition. All those inefficiencies of moving data from main memory to GPU memory and back are gone and the best type of compute can be chosen for any task either GPU or CPU or fixed function hardware that’s also in the SoC like CODECs and Machine Learning hardware.

With Apple Silicon will deliver on the promise of OpenCL which, now transformed into Metal, will have the benefit of actual heterogeneous hardware to run on. Unlike the consoles which are updated every few years Apple will likely be updating their SoCs yearly like the iPads. They don’t have to wait for Intel or AMD to produce new hardware and are masters of their own destiny.

The biggest problem Apple will probably have to overcome is that the hardware will not look, on paper, as impressive as a PC but the real world performance will be superior and of course it won’t be a space heater.

Grzesiek · October 21, 2020, 10:34pm

is this code update purely for Apple? or woudl be applicable to all OS’s that use OpenCL/AMD?

anon71893420 · October 21, 2020, 10:41pm

PC and AMD only.

The conversation moved onto Apple because there was a gap in news…

You can follow nirved on Blender’s development site but he hasn’t added anything yet to report.

Midphase · October 21, 2020, 10:52pm

Judging by the latest Developer Notes, it seems like this is pretty low on the priority list.

Marc_Driftmeyer · October 22, 2020, 1:15am

That’s wrt the Integrated GPGPU for Apple Silicon and it’s relative to that of iGPU by Intel. Discrete GPUs are and have been Tile Based Deferred Rendering by AMD and Nvidia for several years now.

Apple will expand their use on the Mac Pro lines, not deprecate their support. They didn’t invest four years of heavy investing on the Mac Pro line to shit can it for a low power consuming Apple Silicon SoC.

The Apple Mac Pro with Silicon is a good three years out.

cekuhnen · October 22, 2020, 1:30am

Does openCL even make sense at this point? Just curious considering that Optix does.

Midphase · October 22, 2020, 1:47am

I think Optix = being locked into a closed-system proprietary technology that Nvidia fully owns and controls. Right now Nvidia is the nice guy, but who knows about the future?

cekuhnen · October 22, 2020, 1:47am

Thats actually what makes me curious about the future of Blender for ARM macOS.

Nirved · October 22, 2020, 7:57am

Please stop polluting this thread with Rotten Fruit Company stuff.

anon71893420 · October 22, 2020, 8:09am

Ouch.

Do you have any updates you can share Nirved?

I’m particularly looking forward to the Big Navi release and hope AMD will give me an option for Cycles rendering. Can you say more on how you’re going to up the performance and possibly surpass GPUs from other vendors? What are the optimisations you’re hoping to bring or what do you propose to do differently in laymen’s term?

Thanks for taking this task on, it looks incredibly well timed.

Nirved · October 22, 2020, 8:56am

It’s too early to talk about performance. There’s nothing to execute on GPU.

There are two major things which cycles does: ray intersection, and shader evaluation. The performance depends on simultaneous execution and memory access. Simultaneous execution can go very bad with current compiler, specially for SSS and volumes, and for complex shaders. My compiler will first ensure the most visited code (ray intersection) is executed together, via memory trade-off. Second would be grouping similar shaders together, this worked well on the “classroom” benchmark scene.

The second limitation is memory access and specially random memory access. Code has to wait for data to become available. This is somewhat mitigated by cache. Also code has to wait for the next code to become available.

I expect my compiler to outperform other vendors gpus on complex scenes. Also volume rendering would be blazing fast compared to now (GPU comparison only).

anon71893420 · October 22, 2020, 9:02am

To say I’m looking forward to seeing how this project goes is an understatement.

How will AMD’s hardware Raytracing feed into your work?

Good luck with this Hristo.

Nirved · October 22, 2020, 9:06am

I know nothing about AMD hardware ray-tracing.

Grzesiek · October 22, 2020, 1:04pm

Are you adding the code back to Blender master? or will you be selling this as own product like e-cycles team?

cekuhnen · October 22, 2020, 4:27pm

I hope you will at one point seems to me that this is AMDs only means to compete with NVIDA and Optix / CUDA

silex · October 24, 2020, 8:19pm

Hi Nirved! Is your project generation specific (Navi) or should we expect general uplift in performance across all (reasonably recent) generation of cards?

Nirved · October 24, 2020, 8:30pm

@silex GCN and RDNA. That is HD 7970 and newer. I expect performance uplift, but cannot promise it for now.

Nirved · December 8, 2020, 12:40pm

Was sick most of November, 3 weeks of mild covid-1984, had no mood to code. Gaining back motivation these days, continuing with the project.

AMD published RDNA2 instruction set, so I’ll add it as a target.

Andrea_Monzini · December 12, 2020, 3:39pm

I hope that these info are useful ( but probably you already know it ):

“Cycles OpenCL hardware raytracing intersection for AMD Rx 6000 series”:

Rendering Meetings:

https://devtalk.blender.org/c/blender/meetings-archive/

YAFU · January 15, 2021, 1:55am

Hello Nirved.
Is your work sponsored by AMD or the Blender Foundation? Or is this an independent work?