Built myself a GPU render-farm out of mostly used mining gear


(Casey) #1

I’ve been plotting and planning this project for the past couple of months, after tinkering with animation, really enjoying it, but spending 18 hours to render a 6 second clip in 4k, I thought of some alternative methods.

At first I was contemplating picking up some old, second hand servers and setting them up in a network render situation, but after doing some math and cost calculations, I figured I could get a lot more bang for my buck with a GPU render farm.

So I picked up a six-pack of used mining rx480’s on ebay, and connected them with PCIe adapters…
image

…via USB, to more adapters…
image

…which are plugged into the two 1x PCIe ports on my Crosshair VII mobo, which run through the chipset so as not to take away any PCIe lanes from my GTX1070.

Also needed, was moar powah;

Now…as it turns out, the 1600, even in this configuration, is overkill. Under a full rendering load, the entire system pulls 800w, where I was expecting closer to 1200. I guess that just means I have room for two more rx480’s considering the PSU has two more VGA 8pins to use still. (Freaking monster has x9 8 pin outs!)

OK, OK, so what about it, how does it work. Well, for a baseline, I rendered the classroom scene, in 2.79b with a tile size of 320x180 on the GTX 1070 by itself, and it rendered the scene in 8:18.47

With the render farm…

Now…can I render it even faster without sacrificing any quality? Gimme some ideas if you think I can.

Latest time: image


Multiple GPU's in blender
(Lincoln Deen) #2

Windows has a thing where gpus are slightly slower than they would be on Linux, idk what os it’s onz juat keep that in mind!


(Casey) #3

Yeah, I’m just waaaaay too stupid to use Linux. I’ve tried it several times, and I always break it, and can’t figure out how to fix it. I’m to the point I hate Linux as any sort of desk top environment.

Which sucks, because I have to be a Windows normie, which I don’t like either.

Also, the game I do content creation for, doesn’t work on Linux steam play, so Linux is a no go there as well.


(Lumpengnom) #4

How much did it cost to buy the components?


(Casey) #5

PSU, all six GPU’s and the various PCIe bits and longer USB cables. In the neighborhood of $1,100 US.


(kabu) #6

“moar powah” It’s the part I like best.:grimacing::grimacing::grimacing:


(LordRaven) #7

This looks awesome!
I was thinking about doing something similar myself.
But you’ve shown me an alternate solution - thank you!

Btw: can you do more tests and post the results? I would really like to know how much faster that would compared to my GTX 1080 TI / 970 combo.


(Martynas Žiemys) #8

That sounds quite reasonable. A weekend is 48 hours…

Anyway, you could look into connecting that to SheepIt free crowd render farm. It would collect points quite fast so you could have access to a lot bigger render farm with very little waiting queues if the limitations of it suited your situation (I think main ones are: around 500mb file size limit, no scripting, no 32bit EXRs as output as well, only 16bit PNGs). It does not render small renders like stills from benchmark scenes faster, but if you have animations or a lot of huge renders and we are talking about days, not hours, the speed up is incredible.


(Casey) #9

Sure thing. Some notes. The first render is usually slow as it’s spending some time loading kernels. So I do a sacrificial first render, then timed renders for the results achieved below.

Also, these aren’t necessarily ideal condition tests, I was watching videos, chatting on discord and doing these renders in the background, which is likely why my classroom scene rendered slower today than it did yesterday.

Some render test results;

(This is the older, orange BMW)
BMW scene with 160x90 tile size: 52.28
BMW scene with base 256x256 tile size: 58.77
BMW scene with 320x180 tile size: 1:00.82

CLASSROOM scene with 256x256 tile size: 1:53.12
CLASSROOM scene with 320x180 tile size: 1:54.12
CLASSROOM scene with 320x360 tile size: 1:59.14

FISHY CAT scene with 252x117 tile size: 31.76
FISHY CAT scene with 256x256 tile size: 31.77
FISHY CAT scene with 252x234 tile size: 31.90

GOOSEBERRY Seems to be too large to render on GPU, I got an error;
CL_MEM_OBJECT_ALLOCATION_FAILURE

PAVILION scene was rendering pink, not sure what’s going on there so I didn’t go through with testing.


(LordRaven) #10

Thank you!


(Grzesiek) #11

That is quite nice.

Now the main question, can you perform a BMW sceen, starting with 1x GPU and keep adding the remaining ones to see what improvements you get with each additional GPU?

My main “concern” is that the PCIe 1x via USB solution is severly limiting the performance of the renderfarm.

Some time ago another post here had used similar approach, with some GPU’s via PICE x16 slots, and few with 1x extention units liek the one you have here. It ended up that the 4 GPU’s via PICE 16x showed improvement and the remaining 4 via PCIE1x barey improved the render speeds.

I’m asking mainly cause I want to follow in your footsteps, but want to be sure that the 1x PCIE extension calbes are not limiting the overall performance.


(Grzesiek) #12

also with MEM allocation failure. are your rx480’s the 4GB models? or 8GB models?


(Casey) #13

8GB

I only ever ran the numbers in theory, as the PCIe 2.0 bandwidth is 500MB/s per lane…and that’s simultaneously in each direction. Which means, even if you could saturate each card with a full 8GB file size, it would take just 16 seconds (in theory) to move that data to and from the GPU.

Since I’m working with much smaller files, the PCIe bandwidth should be inconsequential. And even with larger files, the fact they would take longer to render should make a 16 second delay inconsequential as well.

But it’s a good question, and theory doesn’t always translate into reality, so let’s run the numbers…

*Again, the tests are being run in the background, while I watch videos, surf the web, chat on discord, etc. Also, it might bear keeping in mind I’m on a Ryzen 2700X, 8c/16t computer, so multitasking is no sweat on this workstation. Your mileage may vary.

1GPU : 4:00.55
2GPUs: 2:08.07
3GPUs: 1:31.50
4GPUs: 1:13.42
5GPUs: 1:03.11
6GPUs: :56.19

So indeed, there are some diminishing returns. The second GPU scales almost perfectly, but to cut render time in half again, you need three more GPU’s. But is this a PCIe limitation? A software scheduling issue? How would it look on Linux? Those are some questions I can’t answer. Maybe someone else has enough mobo slots to run 4 GPUs on maybe a Threadripper mobo with all those PCIe lanes (x16/x8/x16/x8) and can test if each additional GPU scales more efficiently.


(Grzesiek) #14

I have a Threadripper and have 1 Vega and 4 480s (though only 1 of them currently plugged in until I water cool them all). Goal is to use 5 PICE slots on my Gigiabyte X399 Designer board, leaving 4 NVME slots that I could potentially run converters and get 3 more cards via the PCIE extensions and NVME to PCIex3 converter.

From my earlier tests - September 2017 (so quite some time ago) - dont’ recall the tile sizes… This was run on AsRock Dual Xeon (2011 v1) board… but still had 3 of them in PCIe 3.0 16x and one was in PCIE 2.0 4x…

On the classroom scene
4x rx 480 : 2.79 R2

1x - 8m 59s
2x - 4m 31s
3x - 3m 11s
4x - 2m 20s

So overall near “linear”


(Casey) #15

I have a few questions for you. How will you fit 2 slot GPU’s into that crammed middle slot on that mobo? I get that watercooling makes the GPU thinner, but what about the I/O shield/plate? Are you able to dispense of the video outputs to make the package overall thinner?

Also, on that mobo, Wendell at L1T did a review on that one, and liked it a lot, but that middle slot is just PCIe 2.0 and it just ran through the chipset and I think it was 4x anyway.

Timestamped to the PCIe layout

I went ahead and ran a benchmark series on the classroom scene to see if scaling improves, or gets worse with my setup;

1GPU : 8:41.87
2GPUs: 4:31.92
3GPUs: 3:18.73
4GPUs: 2:32.85
5GPUs: 2:12.24
6GPUs: 1:56.69

Similar diminishing returns as the BMW scene. I mean, it’ll still add up over a multi-day render, but still, probably wasn’t worth the investment :stuck_out_tongue: Like that’s going to stop me from getting two more! :smile:


(Grzesiek) #16

Yup, fully aware of that… and slightly annoyed that no one yet released a 7PCIE slot design . Asus did for the higher end Intel Workstation systems.

Still I want to use NVME to PCIe x4 converters taht way I can utilize the NVMEs i’m not using as GPu slots. and as they are PCIe 3.0 x4 each… that would provide double the bandwidth over the middle slot…

Still, as for the GPU’s I’m using. I’m using reference designs which have only 1 row of outputs so making them 1 slot with water cooling is easy.


(blendit012) #17

you can’t break Linux. it’s easy use Ubuntu or Linux lite.


(Lalaland) #18

I am planning something similar, couple of questions.
Can you use mining cards out of the box , the drivers work for Blender ?
Could I use one of these "mining " mobos on a network and just stack it with GPU´s
https://www.asrock.com/mb/intel/h81%20pro%20btc/


(joseph raccoon) #19

This is going to go into some brass tacks on hardware and noise rejection, but the PCIE-1x is enough to push a video card if it is just rendering, yes more lanes is faster but meh, budget solutions (although you can retool some server riser boards for this, but very much YMMV and don’t use hardware you love for this)

But your main loss in a setup like this is going to be EMI noise, Make sure all of your USB has a ferrite noise suppressor Something that most non-electronics types don’t appreciate is how close the data levels on a PCIe is to the noise-floor, so any added noise is going to result in the card having to ask for the data to be resent, but this can be mitigated somewhat.

1- Buy a case to keep your cards in, honestly I would just retool a server case for this, it will have room enough for your motherboard and you will be able to fit several cards in the case.

2- For routing your cables inside of the case, I would useAluminum Foil Tape to keep them shielded and on the same electrical level as the case. and use as short of a run as you can get away with for this!!! And you will be using this take to secure the length of each cable to the case (it is kinda on the getto side of how to do this, but it works in a pinch)

3- If you have more than one power supply, you will use a thick copper grounding cable to go between each of their cases. Much of controlling electrical noise is keeping all components on the same level so there is no difference in electrical potential between them.

4- Avoid Sharp Bends in your cables, and avoid loops!!!

5- if you need to run some cables to another case, use some braided copper wire to keep them bundled together in, First rout them, and then label each end (be nice to your future self) and then use dental floss or some waxed cord to snuggly bind them together every inch or few centimeters, make sure they are not twisting and overlapping, then you will feed the length of this down the copper tube , easy way to do this is to use masking tape to keep all the ends together and for me I’ll often just tape the end of the wires to a broom handle and pull them that way (don’t laugh, it works). This copper shielding is going to first shield each of your cables from EMI, and with the way they will be bundled together it will cause all of them to rise and fall at the same rate (mostly) for any EMI interference that they should happen to receive, this will reduce differences in potential between the PCIe runs.


(Lalaland) #20

Do you know If I could use one of these boards and stack it with Sapphire RX470 mining cards ?