Blender 2.62 - Cycles: Second card used for half the power.

Hello.
My configuration is:

Processor: Intel Core i7 2600 @ 4.2Ghz
GPU: 2 x Geforce GTX 580 (SLI Mode).

The rendering using Cycles is very fast, but when I monitor the GPU usage with some utilities, I notice the second card is barely used over 50%.
The first card renders at 99%, while the second card uses 48-52% of the power.

How to solve this problem? I have rendered some different scenes with the same result.

Thank you.

OS? RAM?

Probably this is the limit for your hardware and your software, especially if you are under Windows.

In the latest AMD and Intel CPU ( or in better terms, APU ) the memory controller is integrated directly in the die, this mean that the big part of the limits of your hardware configuration depends directly from the type of APU that you choose, basically the motherboard it’s not a big deal as it was in the past, especially considering how the X86 architecture operates.

Considering that you can never achieve the nominal numbers, the memory management and the drivers under Windows literally a mess when it comes to performance and a full use of your system resources, and this can make you loose another fraction of your system power.

I can recommend a GNU/linux distribution if you want to squeeze your hardware and have a better resource management, also the SLI mode it’s not the best choise for 2 simply reasons:

  • usually the driver from Nvidia are not really optimized for that
  • the SLI heavily rely on the motherboard BIOS and on the chipset to work perfectly
    in the end the best way to invest your money is a single high end video card rather than in a SLI.

Also try to avoid fancy OS like Windows and MAC OS, they can only waste your resources tracking everything that you do on your desktop and they do not simply take a 100% use of your hardware for the user experience.

Try to update your drivers, especially the ones from Intel, but do not expect too much from this kind of OS.

The OS is Windows 7 Ultimate 64 bit, RAM is 8GB DDR3 1600Mhz. Drivers are updated.
Sorry for saying this, but you have many (stupid) prejudices about operating systems. Prejudice without technical knowledge is fanboysm, expecially when you speak about memory managers that don’t count a dime in this problem. I’m searching for technical answers to problems, defining “fancy” an operating system is not the kind of answer I’m searching for. FYI, SLI works perfectly in other applications (and yes, under Windows 7). LuxMark is an example of application that can squeeze all the available power using SLI for rendering. So your answer is inappropriate, biased and technically irrelevant. Even professional cards use SLI nowadays and the CUDA software is the same used in the professional cards world backported to the consumer world. I really doubt the problem is solved using a dual Quadro configuration, the driver part that manages SLI is the same code, really.

I just want to search if there are some problems in the Cycles load balancer or if the problem is a common one. No, operating systems are not involved in this problem, sorry. People shoud just stop seeing Linux as the solution for every problem. It’s not. Linux won’t fix bugs or inefficiencies in application code. Moreover, CUDA (and in a certain sense, OpenCL) drivers for nVidia are one of the best drivers on the market. Linux drivers lag behind compared to the Windows version. AMD/ATI drivers are not even an option. This is a fact, not an opinion. I have been a Linux user for 14 years (and I still use it), its advantages are not certainly in the range of general purpose software like Blender could be. Maybe fedoraforum.org or ubuntuforums.org are more appropriate for your advertising needs.

Well, as you’re talking all fancy about technology:
GPU raytracing, uses multi GPU, which has absolutely nothing to do with SLI. SLI is soley used in the rendering pipeline of OpenGL/DirectX.
I’d start by disabling SLI in your NV Control Panel, if that doesn’t help, remove the SLI bridge as well.

No professional card uses SLI for CUDA, it was only a gaming feature. Only recently there where SLI ready Quadros and the only application for SLI with Quadro cards is to boost viewport performance, for instance to create a stereoscopic display over max. 8 screens with a lot of juice behind it, or to have a realtime OpenGL model with a ridiculous amount of polyogns, or to vizualize pointclouds in medical applications.

Also a Quadro Card is much slower than a GeForce in CUDA in SP mode. Even a Tesla is slower than a GeForce in SP mode as both of the latter have fewer cores and lower clocks. In DP a Tesla leaves the GeForce in the dust though.

And there’s a reason why most scientific Tesla computation satations run linux, because it’s much more efficient and that has nothing to do with fanboyism - but that’s not the issue here.

Basically what Arexma said. SLI is widely considered to degrade CUDA performance and benchmark threads in these forums have proofed that Cycles render times are noticeably shorter in Linux distros.

One further thing to point, though: presently multi-gpu support in Cycles is at a very early stage and 2 GPUs will only give you just a 30% improvement over a 1 GPU.

So if you are worried about getting more juice out of your setup you should do three things:

-Disable SLI in the Nvidia control panel as Arexma said
-Setup a dual boot with the Linux distro you like best. Since you’ve been a Linux user for 14 years you surely know how to do it and it will cost you no money.
-Be patient and wait until multi-gpu support is optimized in newer Blender builds. :wink:

OpenGL rendering pipeline, yes, which in turn is used by Blender itself for the viewport and in general for its UI. That’s why I have SLI enabled.

I’d start by disabling SLI in your NV Control Panel, if that doesn’t help, remove the SLI bridge as well.

No, modern nvidia drivers completely ignores SLI for CUDA applications. Even if SLI is enabled, CUDA will crunch 2 “tasks” for 2 cards and not 1 “task” for 2 cards as in SLI mode. As a result, this is irrelevant.
I even tried practically, confirming what I said: no difference in performance at all.

No professional card uses SLI for CUDA, it was only a gaming feature. Only recently there where SLI ready Quadros and the only application for SLI with Quadro cards is to boost viewport performance, for instance to create a stereoscopic display over max. 8 screens with a lot of juice behind it, or to have a realtime OpenGL model with a ridiculous amount of polyogns, or to vizualize pointclouds in medical applications.

SLI on Quadros is available since the first nForce Professional 2200, a chipset from the 2005 era. 7 years ago cannot be considered “recent”. Again, those you mentioned are the reasons I have SLI activated on this system.

And there’s a reason why most scientific Tesla computation satations run linux, because it’s much more efficient and that has nothing to do with fanboyism
No. They use Linux because it’s the only OS that has support for that hardware (large clusters with lots of supported esotheric hardware) and at the same time have a native driver for the cards. Also, Linux is easier to program in C/C++ (and to program in general, considering the large availability for good programming tools), GCC is installed by default and it is considered the standard in academia / scientific world. Efficiency has nothing to do with it and CUDA efficiency in computations don’t depend on the operating system at all (bring numbers if you want to go against this fact, I personally don’t see any difference in performance). Most of the Tesla / Quadro certified workstations are sold with Windows offering Linux only as an option when available. Just look at the driver download page for certified workstations. “Efficiency” is an over abused word. Seems like people use it when they haven’t arguments. Facts are: you have to bring numbers when you use that word.

  • but that’s not the issue here.
    So why even speaking about it. Back in topic?

Ok, you have practically solved my problem. You answered to what I was searching: it’s a Cycles infancy problem, not an inefficiency of some sort.
Thank you for the answer.

BTW, for the Linux “faster” renderings: the majority of the Linux users use customized build with newer GCC versions and more aggressive optimizations, that’s why “the majority of Linux users reports faster renderings”. As you can see, it has nothing do do with the OS in general: I can compile a Windows build with those flags turned on, obtaining a faster build compared to the standard Blender distribution on Linux. Linux is an operating system, not a miracle. Users always associate performance to the OS, while the relevant factors in that area are rarely OS dependent.

With nvidias stunt to make the driver render backface shading via software and not support it per hardware anymore, using SLI for Blender is… akward. Because you can’t make up with 10 cards what slows down Blender :wink: On top of that you got the issues with the VBO. It should help though on the doubledraw of objects for outlining.

Yeh absolutely, actually if you can configure e.g. the Visual Studio compiler properly you’ll get a superior binary to every GCC version - faster and smaller :wink:

Anyways, I had SLI under suspicion because there where massive problems before CUDA4.0, after that I stopped to follow development and I stopped using SLI about 2 years ago, just before they finally fixed the issue of multiple monitors on GeForce SLI.
It was too little gain for too much hassle.

Without knowing Cycles source by heart and the intestines of Nvidias CUDA drivers you can just take a longs tick and poke int he dark :wink:

You are wrong simply because all you are saying is against how an X86 architecture works; nothing less if you trust a benchmark and the words of a brand offering “a commercial product with another commercial product”,probably you will never reach a good level of performance.

If you think that is enough to have 2 video card to have the double of the performance you probably never study how a system actually works.

In every system where the performance are a critical point a kernel UNIX or Linux is in use, that’s what i can tell, and since you are talking about benchmark you are considering, again, fancy stuff from the marketing language and from what the brand want you to buy.

Never the less, GCC is a generic compiler suite, and it’s true that a good compiler can affect the performance but not as a really good driver, GCC is a good and old suite that can be surely improved but i personally do not expect an huge gain in terms of performance from the version A to the version B, a compiler it’s not what offers you the cutting edge performance that you need and also plays a minor role in how an X86 architecture works.

Just to say something, with a bad driver you can turn a 16 core cutting-edge machine into a 486 or worst.

If you want to stick with Windows is fine, but the industrial standard for the ones who are looking for performance is Linux or BSD, that’s all, and before judging an answer please open a book about an X86 architecture to see how you computer, your hardware, your kernel, your drivers, works.