GTX 1070 way slower than GTX 660!

Just updated my GTX 660 (2 GB) to a GTX 1070 (8GB) with hopes of a massive improvement in render time and capacity. Before inserting the new card, I ran the BMW27.blend benchmark, and repeatedly came in around 03:06 minutes. After installing the 1070, a first benchmark of the same file came in at 09:48!

And that is not all. With the 660, the benchmark used 1 thread. With the 1070, it uses 8 threads. If I change Auto-detect to Fixed with 1 thread, it takes roughly 50 minutes.

These results are unacceptable of course. How can I troubleshoot this to find the bottleneck?

Make sure that you are actually using the GTX 1070 and not CPU to render. The times you are refering to are very likely the times of your CPU.

You for some reason are surely rendering with CPU.
You make sure you have properly installed nvidia drivers, and you have properly selected your GPU from User Preferences > System, and then also from Render Tab:
https://i.stack.imgur.com/W7X2W.png

Thank you. Checking the GPU setting on the Render tab was my first instinct, and the benchmark file also came pre-configured to use GPU. GPU Compute was indeed selected. But it turns out I had missed a setting on the System tab in User Preferences. Having CUDA selected is not enough; one also needs to have the button below activated.


Yes, that button can be a bit confusing.
In 2.79 testbuild or downloading blender from https://builder.blender.org/download (first links), button has been changed to a check box now.

So … an update to the render times (just for information)?

Of course. The new render time is 01:06, so it’s a 65% improvement. If it’s interesting, the rest of my PC is at least 5 years old (except RAM, which I upgraded to 16 GB recently). I’m not sure what difference a new CPU and motherboard would do, but I guess not much. My CPU is an Intel Core i7 920 @ 2.67 Ghz.

Something is not right. I have a Ryzen 5 1600 running at stock, and my CPU renders the file (that YAFU linked to) in 6:53 using Blender 2.78c. Moreover, I even optimized tile size to 16x16 for a best case. (If I just hit F12 using the GPU scene, my CPU took 12:23 because the large tile sizes left many threads idle.) Based upon this benchmark by Gamers Nexus and your time of 9:48, I should be getting 4:12 at most since your time is likely not a best case regarding tile size. Based upon my render time, I would expect the i7 920 running at stock to render the scene in at least 16:05.

So, I am curious if you changed any of the settings. My total samples is 1024 AA (from Sampling panel), and my resolution is 50% of 1920x1080. Again, I chose a tile size of 16x16 for a best case since my time was horrible with a tile size of 256x256. I would appreciate you helping me figure this out. Thank you.

@AustinC, I do not know if I understood you correctly, but I think that here I did not share any scenes. Maybe another thread?
Here I am pretty sure the scene used is BMW27.blend from here (you apparently are using blender.org scene):
https://blenderartists.org/forum/showthread.php?239480-2-7x-Cycles-benchmark-(Updated-BMW)

Yes, you are correct. Thank you.