Boltzmann isnt about building a cuda compiler, it’s a platform to help port cuda code (many projects for gpgpu have already been done with cuda) to help leverage investments already made to also be able to utilise Opencl technology, Not just AMD either.
It just helps convert Cuda to a portable C++ GPGPU alternative that most people say runs just as fast as the original cuda if done right.
Otoy have an Opencl version of their engine but have chosen to use Cuda as their platform due to the fact Apple and Nvidia dont even try to support opencl properly anymore. Hence in their eye’s supporting opencl is a waste of time as those platforms refuse to keep upto date with the standards.
Rather than moaning about AMD maybe Nvidia should be made to support platforms other than cuda properly to give their customers better choice and support.
And if you want to talk about innovation in GPU compute just look at AMD’s new offers, Whole new GPU memory system design that allows GPU’s to access upto 512 terrabytes of not just graphics memory but ssd and main system memory, Pro cards with 1 TB memory support etc etc. That is the way forward, Why would AMd at this point even care about Cuda
OpenCl IF supported correctly can do anything Cuda can, For example Cycles Opencl still sticks to Opencl 1.1 i think, at best 1.2.
Opencl "2.1-2.2 is a whole new beast that once supported is far more the equal to Nvidias cuda but to keep compatibility with older tech opencl 1.2 is mostly used at this point.
Here’s a test branch i did a while back that not working on anymore due to accidentaly deleting the code (beer can be a distraction).
It’s for opencl users:
De noise branch V4 from Lukas (which for Opencl users even if denoise not activated wouldnt compile kernels so i built a branch that you can use with CPU denoise but also GPU opencl rendering still works but MAKE SURE DENOISE IF OFF or you get a blue screen of death) As far as i know Once these patchs are added Opencl cycles will do everything Cuda cycles can, But faster ive been told as SSS on cuda is slower than CPU, yet this is about 20% faster.
Also added opencl GPU SSS and Volumes, Approx AO for lighting after a set GI bounce level
Compositor Anti Alisaing node and a few other goodies. Have to start from scratch though as deleted this branch by accident.
New old build: https://mega.nz/#!FoIiBYoS!qlZaylmDOMRBrjB3Quo2oI4lO5SAXiXK7hWHIbs 2mO4
Also this is a quick nasty test of AMD’s Radeon rays intersection api that powers Radeon Pro render, It’s CRAZY fast: