2.8 Branch update with Cycles CPU/GPGPU rendering together

(Marc Driftmeyer) #1


CPU rendering will be restricted to a BVH2, which is not ideal for raytracing
performance but can be shared with the GPU. Decoupled volume shading will be
disabled to match GPU sampling.
Thread priority or GPU sync tweaks are likely needed to improve performance,
but might as well post the patch for testing already. Perfect scaling is not
going to happen due to BVH2 usage though.
Go to User Preferences > System to enable the CPU to render alongside the GPU.

(animani) #2

Looks promising. Lets see how it goes.

(razin) #3

can we have a link to download the branch please ?

(Database_Scene) #4

Is this compatible with SMP? (bi xeon platforms for example)

(Marc Driftmeyer) #5

It’s been approved: https://developer.blender.org/D2873

I have yet to see it pull from Master. I don’t build branches with specific diff changes so I have no knowledge of exactly how they are branching testing of such and remerging. I’m not interested in brushing up on git merge and working on several local copies.

I would assume once they recognize the CPU it’s limitations would dictate the threads available for OpenCL based CPU kernels to leverage.

/* Fallback to standard device name API. */
if(name.empty()) {
    name = get_device_name(device_id);
/* Distinguish from our native CPU device. */
if(get_device_type(device_id) & CL_DEVICE_TYPE_CPU) {
    name += " (OpenCL)";
return name;


(juang3d) #6

Will this be merged in current master? Or only 2.8?

(Marc Driftmeyer) #7

I believe it is going into Master first as the source code being effected is the 2.79/master branch. I’m seeing nothing in the properties.py reflecting this feature.

Even though it’s gotten the greenlight I still don’t see it merged.

(Marc Driftmeyer) #8

From what I can tell this is going into 2.79 next revision but getting refined at the moment.

** \file BKE_blender_version.h

  • \ingroup bke

/* these lines are grep’d, watch out for our not-so-awesome regex

  • and keep comment above the defines.
  • Use STRINGIFY() rather than defining with quotes /
    #define BLENDER_VERSION 279
    Several breakages with 270, e.g. constraint deg vs rad */
    #define BLENDER_MINVERSION 270

/* used by packaging tools /
can be left blank, otherwise a,b,c… etc with no quotes /
alpha/beta/rc/release, docs use this */

extern char versionstr[]; /* from blender.c */


Right now it’s the following issue:

viewport rendering of BMW from official benchmark pack takes 12seconds on 1080TI, 20seconds on Vega64 and 16 seconds using both. With F12 render, that’s the opposite, Vega is faster with 82sec (at 128x128, best time), 1080Ti takes 93seconds (at 16x16, best time) and both take 44seconds using latest master with initial_num_samples at 5000.
To sum up:

  • viewport seem really slow in latest master. OpenCL. 2.78c with selective node compilation for viewport renders nearly 2x faster on Vega 64. It’s not due to SSS or volume as those are not compiled in viewport kernel either. I can investigate on that.
  • multi-device rendering is slower with viewport/progressive rendering than the fastest device alone. Logic would be to wait for the slowest half to finish, which would be around 10seconds for Vega?

I’m glad it’s not going into 2.8 first. Let us bang on it and file bugs to test, and then later roll it up into the big future release.

(Grzesiek) #9

That is quite promising… my system will finally get a full workout when rendering…

(Esparadrapo) #10

Crossfit for computers.

(LazyDodo) #11

Anything currently being committed into master has no guarantee what so ever it will end up in 2.79a/b/c. When the time comes for 2.79a we will look though all commits, and cherry pick the needed bug-fixes and transfer them over to the 2.79 branch, new features and especially risky and/or compatibility breaking changes generally don’t make the cut. But really until we sit down and sift through the commits nobody can know for sure.

(juang3d) #12

I just did a test with a BB Build and the result is AMAZING!!

In a test scene I used I got 9:18 for the GPU only render (GTX1080) and 5 minutes (I don´t rememeber seconds right now) for the GPU+CPU… this is AWESOME!!!
Around a 40% improvement :slight_smile:

Here is the test scene rendered with GPU+CPU:

(esimacio) #13


(esimacio) #14

Its here: https://builder.blender.org/download/blender-2.80-0e7113d-win64.zip (sorry for double post)

(esimacio) #15

Hmm if I am not mistaken, the combination of CPU + GPU is not so that it will go faster, technically it should be about the same render time. I think the combination is for memory saving. I often get out of memory with my 4GB card, with the combination of cpu+gpu this should no longer be a problem, the render time should be the same as with GPU only.

Correct me if I am wrong please.

(SunBurn) #16

I think esimacio is right.

The 2.79 GPU + CPU builds give me always slower results, (sometimes close to GPU but only if i set my tile to 32X32).

But on scenes where my GTX 970 memory is limiting my GPU only render I can go for all three, 970,1060 + i7 using CPU+GPU.

Unfortunately today I get weird pinkish results :no:, (only in CPU+GPU) but I’ll investigate further.

(SterlingRoth) #17

I’m also getting a 40% speed boost on most scenes I throw at it, and most of those aren’t very memory intensive.

(BigBlend) #18

It’s only faster on smaller tiles and that is not optimal for amd cards that uses big tiles.

(brecht) #19

This is wrong, it’s about improving render time only. You are still limited by the memory of the GPU, solving that issomething else.

As other explained, you need to use small tiles to get the render time reduction. We have done some optimizations to render small tiles faster and more are planned, with the goal of eventually removing the manual tile size setting entirely to better balance CPU and GPU work.

(Dito) #20

Yep, around 40% speed boost for me too.

Test scene with 1280 Samples:
Only GPU - 8min 53sec
GPU+CPU - 5min 29sec
Xeon E3 1230 4x3.3GHz

Tilesize Test:
128 Samples
64 Tiles = 39sec
16 Tiles = 55sec

Same Scene only change Samples to 1280:
64 Tiles = 5min 45sec
16 Tiles = 5min 29sec