BSOD on rendering with Cycles

BSOD on rendering with Cycles

First my specs:

OS: Windows 10 Pro 64-bit

CPU: Core i7 2600K

RAM: 16,0Gt Dual-Channel DDR3 @ 668MHz (9-9-9-24)

Motherboard: Asus Maximus IV Extreme

GPU: GTX 580 x 3

HDD: Samsung SSD 850 EVO 250GB (SSD) <–blender lives here
Corsair Force 3 SSD (SSD) <–blender lives here also
Seagate ST2000DM001-1CH164 (SATA) x 2 (In RAID setup) <— render files are stored here

CPU and GPU’s are liquid cooled by custom build loop.

Problem:

When I begin rendering with cycles, after a few mins windows gives blue screen of death and re-starts.
I have tried Blender 2.76b, 2.76 and 2.75, no difference.
Tile sizes used are 256 and 512, no difference.

When rendering with all three GPU’s, blue screen comes quickest.
It has also occurred when rendering with only one GPU.

There is no similar problem when I play GPU-heavy games or run furmark or Prime95(havent done overnight tests yet though).

What has been changed:

Recently I upgraded to Windows 10 from 7.
I changed motherboard from Asus P8Z77-V LK to Asus Maximus IV Extreme.

Before upgrades, I never had this issue.
Also, before Windows 10 updates, I never had this issue.
I rendered heavy image which was used to print out a 170 cm high image, it took 10 hours to render and had many, many tiles.

All drivers are up to date.

Here is my crash dump:

Crash Dump Analysis provided by OSR Open Systems Resources, Inc. (http://www.osr.com)
Online Crash Dump Analysis Service
See http://www.osronline.com for more information
Windows 8 Kernel Version 10586 MP (8 procs) Free x64
Product: WinNt, suite: TerminalServer SingleUserTS
Built by: 10586.17.amd64fre.th2_release.151121-2308
Machine Name:
Kernel base = 0xfffff80146a08000 PsLoadedModuleList = 0xfffff80146ce6c70
Debug session time: Fri Dec 11 14:19:53.891 2015 (UTC - 5:00)
System Uptime: 0 days 0:32:08.578


  •                                                                         *
    
  •                    Bugcheck Analysis                                    *
    
  •                                                                         *
    

DPC_WATCHDOG_VIOLATION (133)
The DPC watchdog detected a prolonged run time at an IRQL of DISPATCH_LEVEL
or above.
Arguments:
Arg1: 0000000000000000, A single DPC or ISR exceeded its time allotment. The offending
component can usually be identified with a stack trace.
Arg2: 0000000000000501, The DPC time count (in ticks).
Arg3: 0000000000000500, The DPC time allotment (in ticks).
Arg4: 0000000000000000

Debugging Details:

TRIAGER: Could not open triage file : e:\dump_analysis\program riage\modclass.ini, error 2

DEFAULT_BUCKET_ID: WIN8_DRIVER_FAULT

BUGCHECK_STR: 0x133

PROCESS_NAME: System

CURRENT_IRQL: d

BAD_PAGES_DETECTED: db68

LAST_CONTROL_TRANSFER: from 0000000000000000 to fffff80146b4a760

STACK_TEXT:
fffff80148b4ed48 0000000000000000 : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : nt!KeBugCheckEx

STACK_COMMAND: kb

SYMBOL_NAME: PAGE_NOT_ZERO

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: Unknown_Module

IMAGE_NAME: Unknown_Image

DEBUG_FLR_IMAGE_TIMESTAMP: 0

BUCKET_ID: PAGE_NOT_ZERO

Followup: MachineOwner

*** Memory manager detected 56168 instance(s) of page corruption, target is likely to have memory corruption.

Thank you for your help, it is really appreciated!
This is my work computer, so I really need to solve this issue.

Best regards,

Ohto

Maybe the cards are getting over heated to quickly and crash the system.

Hi Brentison, thank you for your reply.

In the past one GPU had problems with memory chips heating up, but thats fixed now.
I dont have full cover blocks on GPU’s.
And that caused cuda error, not BSOD.

Best regards

Ohto

Hi Ohto, I know from Octane forum some user get BSOD using GPU rendering because of power supply is near breaking point.
You cant compare render engines with games or most benchmarks, render engines use ~100% of the GPU.
May you can check with Octane demo: https://home.otoy.com/render/octane-render/demo
Demo files are on the page too.
If Octane work fine it is on Blender side and you can make a bug report, for example.

Cheers, mib

I notice a time exceeded error. Somewhere in the windows registry there is a key for the gpu timeout before it will die. If forget where it is or what it’s called, I think it has Timeout in the name. By default it is quite low like 500 ms maybe, you could try increasing it. Best of luck troubleshooting.

also you could do a test making sure you are not using gpu for the monitor display. You mention games are fine which makes me think you are using gpu for display which might cause conflicts trying to render. That is changed in the bios, different for every one, usually something like system, display, screen…

Hi, I don´t read your crash log carefully.
If it crash with one GPU too it can´t be the power supply.
The timeout should restart the driver not crash Windows, look here: https://msdn.microsoft.com/en-us/Library/Windows/Hardware/ff569918(v=vs.85).aspx

Cheers, mib

They say it’s hardware related and usually appropriate drivers/ updating them help. After you’ve upgraded to w10, have you taken care of (re)installing NVIDIA drivers for the cards? MS might have decided to use it’s own something instead.

“…upgraded to Windows 10 from 7…” yet error report mentions “WIN8_DRIVER_FAULT” and “Windows 8 Kernel Version 10586 MP (8 procs) Free x64; Product: WinNt, suite: TerminalServer SingleUserTS”. I 'm not certain if W10 should have W8 kenel and what is TerminalServer here. If this is just a part of Win which makes sure you can read error messages after the crash it’s surely related to the video:

“The Microsoft Windows Terminal Server (WTS) is a server program running on its Windows NT 4.0 (or higher) operating system that provides the graphical user interface (GUI) of the Windows desktop to user terminals that don’t have this capability themselves

Hi all and thank you for helping me! I downloaded Octane render’s benchmark and did some testing with it. Blue screen comes with 3 cards, like in blender. But when testing cards as a single ones, I was able to pin point the gpu which gets the blue screen. Two other ones pass that benchmark without problems.

With GPU-Z I see that the problematic gpu pci-e runs on 1.1 mode, not on 2.0 like those two working cards. (I used that stress test in GPU-Z to confirm it)
Could this 1.1 thing cause this problem?

Best regards

Ohto

I tried to do clean re-install on nvidia drivers, but it still runs on 1.1

Ohto

Hi, i am not aware of switching modes of PCIe slots.
If this would be possible I would switch all of my slots to 3.0. :slight_smile:
What mainboard are you use?

Cheers, mib

Hi, my 580’s doesn’t support 3.0, only 2.0. :slight_smile:
But yeah, I think it is heat problem. I used table fan to bring lots of extra cooling on gpu’s and that trouble gpu did pass benchmark. Then I took table fan off and re-run benchmark…and blue screen again.

No idea why it was working ok for a while and now started bugging. But yes, will need to think some extra cooling on either memory chips or power inputs. Could be power inputs, as memory problems gave cuda error, not blue screen.

Thank you all for your help!! :slight_smile:

Best regards

Ohto

I actually wondered if something changed in Blender’s code…I use a laptop for Blender, and I’m still on Windows 7. My drivers haven’t been updated in years because the <bleep>ing manufacturers don’t update them for mobile cards…so I’m on the same Windows, same drivers, same equipment, etc., but with newer versions of Blender I get BSODs sometimes. I never got them in the past. I wondered if maybe I’m just using more complicated scenes these days, but I got a BSOD just yesterday on a very simple scene. Not sure when it started but I’ve gotten to where I’m doing hardly anything in Blender until I can upgrade to a better machine. I just assumed my crappy laptop has reached the end of its useful Blender life.

If this really is a thermal issue, you could try having your notebook cleaned (dust blown out from the fans etc.).