Next-Gen GPU

Hi Ozu,

I wouldn’t bother with a 2.5 slot card installation they’ll cook to death. It’s doable on a test bench or an open build but not a standard case build. I’ve seen plenty of builds like this on forums that have ended in tears.

I’ve seen a Gigabyte 3090 Blower edition and I know there’s a Seahawk Hybrid coming so I’ll take my time. I’m going to spec out a new workstation when I’ve finished evaluating renderers and by the time I’ve done that all the new CPUs and GPUs will have been released. I have a suspicion it’ll be a 32 core Threadripper and dual 3090s but who knows.

Videocardz is reporting Gigabyte Product Codes which show the rumoured 3070TI and 3080 20GB.

The Big Navi rumour mill is in full swing ( I think AMD is leaking this stuff) and it is claimed the 6900XT is clocked at 2.2GHz and will compete with the 3090. It wouldn’t surprise me if that were true in gaming because Ampere gaming performance isn’t a massive leap forward. In the right benchmark the compute performance of Ampere is very good. Some benchmarks like Redshift show much much lower performance uplift than Cycles and VRay. Unless AMD have a prosumer CDNA compute card to release alongside Big Navi my gut feeling is Ampere will beat Big Navi in Cycles.

Big Navi doesn’t necessarily have to beat the 3090 it can compete with the 3080 and offer much better performance per watt and easily allow dual GPU installations without fear of cooking the GPUs at a compelling price.

Current Ampere drivers have bugs in them, Optix denoising is bugged. Fix is a couple of weeks away. Restock of GPUs is probably much longer!!

Redshift shows a lower performance uplift because Redshift does not do as much raytracing as Cycles or Octane. Redshift kind of fakes lights and other things for speed. This made it the fastest before Tensor cores in the RTX were made. With the Tensor cores Redshift may not be the fastest anymore due to the shortcuts they did before and less raytracing done. I’ve not done tests between the latest Cycles with Optix vs Redshift, but it would be fun to see what the speed difference was on the same scene.

For restock of GPUs the Gamers Nexus video Voidium posted says there is a huge effort now to be the first to restock the 3080 so they can get the sales. They are even air shipping at great cost to be faster. It also says October is going to be the big month for restocking. Many people who really want it will probably be able to get it by November. This initial launch I would say was not a paper launch. Instead I would call it a scalper launch due to not having anything to stop the bots.

1 Like

No need to rush things down, for us (pc is our work tool) stability is the utmost priority so it’s better to let the drivers mature and card be widely available, for both Nvidia and AMD.

For me personally I won’t change anything until the full lineup (from entry to high level) of those new gen cards will be made available, I prefer to have choices instead of rushing head down just to buy shiny new toy.

I suspect this paper launch is made on purpose by Nvidia, I guess it’s a part of those trendy modern viral marketing which represent the new way of advertising things nowdays…

1 Like

Redshift’s unique selling point was speed and that mostly came from a highly optimised CUDA based intersection engine thus it didn’t benefit as much as Cycles, for example, from RTX.

We can see from the Blender CUDA only benchmarks that Ampere has very impressive compute capability, it looks like Cycles is able to make use of ALL the CUDA cores and get maximum benefit from the new architecture and Redshift is not. It looks like the cheats that Redshift is doing behind the scenes has bitten itself on the arse by only utilising the weaker side of the Ampere architecture.

I wouldn’t be surprised if Cycles were now faster than Redshift, RTX and denoising stripped Redshift of most of its speed advantage and Ampere appears to have removed any advantage that was left. It’ll be interesting to see how fast E-Cycles is, do its own optimisations hinder or help performance on Ampere?

ProRender has always had a speed issue, now if it benefits from hardware ray tracing like Cycles did and if Big Navi can compete with Ampere in compute then ProRender 2.0 becomes a very attractive option. ProRender has always been an extremely aesthetically pleasing renderer, as good as Arnold IMHO, it just had a speed issue. There’s a lot of ifs involved but solve the speed issue with hardware and ProRender deserves another look.

From both the coverage of the Ampere 30 series and benchmarks it seems RTX was literally doubled down on. Instead of 8x the work being done per cycles they doubled that to 16x while drastically increasing the raw speed of the rest of the card. This struck a balance in Blender Cycles using Optix and games that use raytracing that gives 2x the performance. Not using Tensor cores is when we see less than 2x the performance because the raw speed was not increased quite 2x.

What it all means is E-Cycles should see the same 2x boost from the 3080 that regular Cycles does.

Something else I found quite funny was Playstation 5 was saying all our games are going to load so much faster than PC because we got this new custom processor that decompresses all the stuff from the SSD making it 100x faster. Nvidia comes along and says PC master race.
image

Here is the ps5 thing if anyone wanted in depth into the impressive tech Nvidia glossed over.

First CUDA benchmarks of the 3090 at Blender Open Data. This card is 20-25% faster than the 3080 in CUDA rendering… lets see with OPTIX. Take this with a grain of salt, but the results are coherent to what was expected:

In-depth look at the GA-102 GPU architecture official whitepaper.

As far as I understand Tensor Cores are not used in Path Tracing apart from denoising.

RT cores do the Ray Intersection tests and CUDA does the Shading. RT cores made a big difference to Cycles as Cycles CUDA based ray intersection code was not as well optimised as Redshifts. But RT cores can only make so much difference and only make a big difference on scenes like fields of grass and forests of trees. Most scenes are still shading limited. Tensor Cores are not the reason for the performance boost under Ampere, I’m happy to be corrected.

Redshift is probably failing to use the vast number of CUDA cores as well as Cycles is. Ampere SMs can do 32 FP32 ops or 16 INT32 and FP32 if mixed. I wonder if Redshift’s shortcuts that proved so effective under previous architecture are now Achille’s heels and their code uses a mix of INT32 and FP32 while Cycles mostly uses FP32 maths and as such gets maximum value from the new architecture?

Redshift benchmark shows about 30% over the 2080ti which has understandably caused a bit of a disappointment over on the Redshift forum. The benchmark barely uses RT cores so it’s mostly a benchmark of CUDA and just doing the numbers game of CUDA core counts you’d expect much more given the sheer number of extra CUDA cores in Ampere.

The 3090 benchmark seems to show Cycles has an almost linear speed up with Ampere CUDA cores. Cycles has probably leapfrogged Redshift in performance. I must say Ampere is a far more impressive compute architecture than a gaming architecture IMO.

I can’t wait to see how Renderman XPU performs on Ampere.

2 Likes

What if AMD brought the SSG to the mainstream? Someone was laying some very heavy hints on Twitter.

Moving data from disk to GPU is something everyone is doing and it would be interesting to see if Sony’s design influences AMD enough to bring the SSG back to life.

Better just be patient. Leaks suggest models with double the VRAM on the way. Nvidia seems to also have “Super”-versions ready to counter the AMD card reveals.

Pulled from the GA-102 whitepaper

Nvidia made a huge leap in the past 2 years with the introduction of RT/Tensor cores along with software/driver support, and it’s up to the software makers to leverage this power, the game developers are still scratching their head around how to take full advantage of all this new stuff (after all it’s still just 2 years old “the hardware accelerated part”) and the gaming benchmarks are showcasing that level of implementation from one game developer to another.

Looking at the “offline” rendering benchmarks gives a better look at what to expect from the new architecture. (and even here there is room for improvement, DLSS - RTX I/O, etc…)

In blender (In my opinion) if we talk “speed” on a scale of 1-10 (for things that could be accelerated by the GPU):

  • Cycles rendering (7)
  • Simulations (2)
  • Viewport (4)

I probably forgot lots of other areas but those are the ones that came to mind right now.

1 Like

RTX IO is built upon MS direct storage, which comes with Direct X12 ultimate from the Xbox series X.
It is MS you should be thanking.

Also there is more to the PS5 SSD, than what is in your post.

2 Likes

and that is?

It’s the developers knowing that every single PS5 in existence has this SSD technology and PC developers knowing that RTX IO is a pipedream in PC Land. Therefore PS5 developers will optimise their games for the SSD streaming and PC developers won’t.

It’s the age old story.

There does appear to be several hardware optimisations that Sony have made so the mid range hardware in the console will compete with the highest end of the PC master race.

If you look at the Steam Hardware Survey there’s very little evidence for the PC Master Race. If you were to build a PC from the most popular components on the survey it would be obliterated by the new consoles. PC developers are targeting this level of PC not some uber high end PC that nVidia may have used to produce the RTX IO chart.

Direct Storage will be a part of DX12 and even if it won’t be widely adopted at first, there will certainly be new cross platform games that aim to support it.

It will probably slowly become the norm, but nothing would happen without an established API.

How long have SSDs been around? Many games still don’t assume an SSD is available.

The PS5’s SSD performance and features will be used day one, that’s the huge advantage consoles have with fixed hardware.

Yeah, I’m not disagreeing with that. I was just pointing out that there wasn’t an API that allowed SSDs to be utilized to their full potential before. Also once console games start to properly design around the tech, we can expect the demands for SSDs in PC games to greatly increase.

Hopefully we’ll see some of this tech also make its way to blender in some shape or form.

Load times of the series x seems good. Loading a whole game in 8 seconds should be fast enough to load objects on the fly like PS5. They say Gears 5 is 4x faster without any code changes. This means not using the MS direct storage. So with code changes maybe 100x faster?

The resume on series x looks great too.

Except for the Spiderman demo I don’t see any videos or official reports of a ps5 actual loading times. It would be nice to know how easy using their new co processor is to put into a game. Did the Spiderman team spend thousands of man hours to get that 1sec. load time? If it is too hard no games are going to use it. It would also be good to know how the same game load times would compare on each system. If the difference is 1 sec. load time and 8 sec. load time I don’t really care. Xbox letting me use my old controller and game pass makes up for it big time.

Since MS direct storage is built into windows I don’t see any reason game devs should not use it. So, ps5, xbox, and pc all have the fast storage in one form or another. This means devs should jump right on the wagon pretty quick. Games always push the new tech. If anything the special co processor on the ps5 is the oddest man out if it takes special sauce to get it working with a game.

Yep, I think it’s telling that even the new Ratchet & Clank’s dimension shifts have a generic interstitial level that is showed during the load before the transition to the new dimension. I don’t want to downplay the speed increase but it doesn’t appear to be instantaneous.