Cycles performance on Tesla V100

I’ve been struggling with my cycles render times - just completed an animation which took >80 hours to render 90s of video on my RTX 2080 SUPER + GTX 970.

So I wondered if I could get better performance on AWS EC2 which offer some beefy GPUs. Just thought I’d share my results, as they’re not as impressive as I’d hoped given the price tag of the K80 and V100 GPUs.

Time to render reference frame (1920 x 1080, 512 samples):
My PC: 2m41s
p2.xlarge (Tesla K80): 8m52s
p3.2xlarge (Tesla V100): 2m00s

Is this as expected? Any hints for better performance on EC2? The costs to render an animation could be quite significant on AWS (for me, a hobbyist, not Pixar).

Interestingly got this working with AWS Batch. It’s quite a neat workflow, my local script just takes the .blend file path and the parameters (scene, size, samples, frames etc), uploads the blend to S3, and submits a AWS Batch job. AWS Batch takes care of creating ECS clusters, EC2 instances etc, and scales up and down (to zero) as needed depending on how many jobs you have submitted, and your compute environment setup. It uses Spot Requests to minimize the EC2 costs. Then I just pick the results from a different S3 bucket.

The render actually runs in a docker container on the EC2 instance, but I believe this is using Nvidia Docker, and I have confirmed it is actually using the GPU to render.

Are you comparing your RTX 2080S + GTX 970 + CPU vs single Tesla V100?

If that’s the case, I’d say Tesla V100 seems quite impressive.

Have you tried distributing a frame per GPU or on your system per processing unit? Cuz that faster and further cuts down the rendering time.

Do you know the Brenda scripts?

1 Like

Just (RTX 2080S + GTX 970) vs the single Tesla V100. I found using the CPU actually slows the render down, as it waits for the last CPU tiles to finish which take much longer than the GPU tiles.

You can get more Tesla V100s (up to 16 for a single instance) but the cost increases linearly. I was just looking at the capacity of a single unit, the we can extrapolate the potential speed of multiple units in parallel - splitting frames across GPUs and instances.

It will be possible to get massively reduced times by parallel processing - but it’s going to cost me :slight_smile:

1 Like

No, I hadn’t come across them - thanks. I note these are 5 or 6 years old, before AWS Batch was introduced.

It looks good, but the AWS Batch approach seems more elegant. No need for starting or stopping instances/nodes yourself. The only thing I can see which my solution lacks is fault tolerance if the spot instances are terminated, but you can choose a ‘retry count’ on the job, which should deal with this.

I have been thinking of open sourcing my solution but there are still a few places where my S3 bucket names are hardcoded, and really I want to create a CloudFormation template to create AWS Batch resources.

Thanks again for the link to those scripts.

1 Like

You don’t have to stop them yourself in Brenda either. They terminate automatically if there are no more frames left to render.

However, your approach seems very interesting and I’d be very interested if you are willing to open source your solution.

Yes sure, I’ll tidy it up and document what you need to set up in AWS then upload it to GitHub - hopefully this week.


Fantastic. Thank you very much. :smiley:

Hi, I finished the CloudFormation template and pushed my solution to GitHub:

There’s a README file in there which should explain how to get it up and running. Let me know your thoughts, I’ll probably start a new thread on here to tell people about this. It’s currently configured not to use GPU instances as the spot prices were really high this morning, but you can see the commented out GPU requirements for jobs.