Second GPU gets disabled after rendering Animation,Please help!

Hi guys, I have this very weird and frustrating problem that keeps randomly happening with my 2 X 1080 Ti Nvidia cards with cycles, my second(lower) GPU gets disabled after I render any animation(it happens across many projects not tied to a specific scene). So this is a description of how exactly the problem occurs :
-Animation render is complete with no problems.
-I try to render again, either on the same blender project or another instance of Blender, I get an error “CUDA error: Out of memory” I know this is a VRAM problem, but it happens even with the simplest scenes.
-when I open blender User preferences and disable the lower GPU, the render works fine using only the upper GPU.(and when I open new blender project I only get the option of one GPU)

Now here is where the weird thing happens, when I try to close the blender file that the error occurred in, blender closes ,but the console window won’t close ,its stuck ,even when try to kill it with the task manager it won’t exist,I event installed special software that kill a certain process and STILL it won’t exist.

I also got familiar with my desktop fan noises that I know when there is a slight noise from the case, I know there is a process running in the background and the fans are not completely silent like in idle state, so maybe the second GPU got stuck at a certain point it won’t let blender exist?

Also even when I try to shutdown or restart the computer it won’t respond and it will say Windows is shutting down and the monitor gets black, and the pc stays on and it won’t shut down, I have to forcefully press the power button until the computer is off.

When I start the pc again, everything is back to normal and second GPU works normally, I render few animations later, and it randomly happens again and I have to shut down the computer with the power button many times a day.

I tried to update the Graphics driver, setting blender to factory settings, but it keeps happening randomly, even with the simplest scene.

And weirdly it only happens when an animation is done, like I can render heavy animations for 12-20 hours and it won’t stop and renders with both GPUs(to exclude overheating problems ) it only happens when the an animation is complete.

Also sometimes it shows an error at the console (CUDA error: Invalid value in cuCtxDestroy(context), line 279) but not all the time

I attached GPU Z screenshots of the upper GPU and the lower GPU (the GPUs that gets disabled) something doesn’t look right.

System specs: I7-7700k ,32GB ram,Windows 10,2 X GTX 1080 Ti ,Evga supernova 1300W,Blender version 2.79

If anyone can help or have any slight idea how to fix this it will be really helpful

Thanks !

I know it may seem stupid, but have you tried taking out and reseating the cards after cleaning the contacts OR swapping the cards into opposite slots? Maybe one of your PCIE slots is dirty or damaged?

I had an issue kinda like that on a machine with a smaller power supply, maybe tether a second powersupply to your other video card and see if that changes anything.

It was an I3 machine that I tossed in a couple videocards in to use as a render node, but it would flake out and stop seeing one of the cards sometimes after rendering, issue resolved itself when I swapped out the powersupply with a new one. (well higher wattage used one)

Hi Steve,actually I never took the GPUs out of their slots,when I clean I just do some light air compressing cleaning.

The system is powered by Evga supernova 1300W,ain’t that enough ?

Plus as I mentioned above ,I can render with the system for 1-2 days non-stop with no problems,the problem only happens when the animation exporting is complete .and gets resolved only when I forcefully shutdown the computer.

You might search for information about multiple GPUs on your motherboard and make sure you have the second card plugged into the most appropriate slot. The GPUZ output shows different PCI E connectivity and no UEFI or bios version showing for the lower card. Also there’s no checkbox by CUDA in the lower card. And there’s other data that’s missing/irretrievable.

You might try swapping the two cards and see if the problem stays with the card or the slot.

If it seems isolated to one of the cards, I’d try to RMA it if it’s under warranty and you can convince them it’s not working right. Or see if you can find someone with another 1080 Ti who will let you try swapping it in to see if a different card has the same issue.

Just giving you a fix for I had for a problem like yours, although just as a rule out, you could try swapping around what power rails you are using and if it has one of those energy saving switches on it, you could try turning it off on the power supply. But there was a recall on some of that series of power supply IIRC, so that could have something to do with it.

You can try cleaning cards/slots and swapping things around, although I would like to see if you can duplicate the problem when you are running single video cards, if not then I would be very suspect about the powersupply that had a recall on it. Because that type of hanging feels like a hardware fault.