Metal vs OpenCL/OpenGL

Herbert, Apple have improved Metal in FCPX so it runs better than OpenCL. But these figures are only Metal vs Apple OpenCL and not OpenCL on Win or Linux.

It is well documented that the Apple OpenCL drivers were extremely buggy and less well developed than either Windows and Linux. The Blender devs know…

Those games are all based on the same graphics API OpenCL though, right?

But yes, people have ran identical programs, one for Windows and one optimized for MacOS and Metal, against each other on the different platforms. Check the few videos posted above. There are plenty more online. If an identical program is ran against itself, one Windows and one optimized to run on MacOS and Metal, the MacOS version has been considerably faster.

As Apple users this is all we are really wanting to say to the Blender Foundation.

Yes, I understand. In one of the links DaVinci Resolve is tested on the same 2019 Mac Pro with CUDA capable GPUs, and the result is that CUDA runs the task faster than OpenCL, and OpenCL runs it quicker than Metal.

On an 2017 iMac Pro OpenCL again beats Metal by a fair amount.

Pure 3d games seem to run faster on Windows DirectX compared to Metal on the same Mac hardware (Windows bootcamp) according to one of the other links.

Which leads me to question whether software running Metal actually will be that much faster on the same hardware, because even the less well developed OpenCL Mac drivers seem to outperform Metal in DaVinci.

Granted, and older version of both DaVinci and Metal, but I can’t see Metal suddenly becoming that much faster. The gains seem marginal in these cases.

That is, FCPX seems very much optimized for Metal. Which means the onus lies with the developers as much as the hardware?

We’ll just have to wait and see until the new Mac Pro is released and actual independent benchmarks are published with hard numbers. Until then we can only speculate.

So far we do have some hard numbers by Barefeats, and Metal’s performance seemingly isn’t anything to write home about except in some outlier cases. Or am I wrong?

3 Likes

I agree with a lot that you are saying here.
But I did check that last link, the one dealing with Davinci Resolve, and noticed it was Davinci Resolve 14 (BMD’s first step towards converting to Metal 2017 if I’m not mistaken)
Davinci Resolve 16.1 is the latest version at this moment and utilizes Metal 2 which has seen gains in performance.
The test was also using a early Beta version of Mojave which Mac users are are now onto Catalina.

Either way those are some interesting results comparing OpenCL with Metal while using Resolve, so I decided to do a few quick renders and post my results. Take these for what they are, one identical system running two separate Operating System. My results aren’t universal and shouldn’t be taken as a professional benchmark.
I did try to be very cognizant of the different codecs used between the systems and approached it how I would on any professional job.
I also tried to keep this as pure as possible by not including an eGPU since these can vary in performance between the two Systems.

System used for both MacOS and Windows 10 Pro:
2017 iMac 5K 1TB SSD
Intel i7-770k @4.20 GHz
Ram: 40GB
AMD Radeon Pro RX580

NLE used is Davinci Resolve 16.1 for both systems.
Video used for testing - 3:14 min 1080p video filmed with Prores HQ (3.6 GB)
I used H.264 for the final export codec, for all test, being that it’s the most common industry standard. And I didn’t want to be rendering a million different codecs. :slight_smile:

MacOS Catalina Render Times
Metal: Prores to H.264 00:22 min/secs
OpenCL: Prores to H.264 00:24 min/secs

Windows 10 Pro with the latest AMD drivers for OpenCL
OpenCL: Prores to H.264 02:38 min/secs
OpenCL: DNxHR HQ to H.264 02:35 min/secs
Also tried using “optimized media” but the results were exactly the same.

Interesting to see the OpenCL on MacOS being almost equal with Metal. I’m glad I ran this test cause I wasn’t sure if Resolve would still let me select OpenCL in the preferences.

Thanks for taking the time to find those links, and your thoughts on programs optimizing their software for any given OS having the biggest impact, compared to what graphics language may be used, is a subject to consider.

Tim

Edit: Just for Shizzz a Giggs I threw the same file into FCPX and It took 00:25 secs to render… Davinci Resolve has gotten fast in the last few updates!

Tim

The difference in those export times can be put down to the Mac using Intel’s Quicksync and the Windows version not using it.

You could export to a different codec i.e uncompressed to isolate timeline render performance from export compression performance.

Interesting, I was trying to do that by exporting to DNxHR HQ then running the test again. As someone who cuts more on Windows do you recommend another codec for me to try?

Just repeat the test and save to uncompressed.

I don’t believe Resolve on Windows or Linux supports Quicksync yet. They do use hardware encoding (GPU) but it’s not as efficient. I believe Apple makes it easy for developers to use it. The difference between using quicksync and not using quicksync is night and day of the order difference your tests show. BTW, Your iMac will out perform the iMac Pro and the new Mac Pro in this test.

Will do.
I didn’t know Windows isn’t using QS yet, good to know.

Not sure what you mean? Are the newer Macs getting handy-capped by Intel?

Thanks for the tests. In the end I suppose the many software and hardware layers are complicated to compare directly. Blender runs generally better on Linux than on Windows. But not in all areas.

Metal is, as far as I am aware, less function-rich than either OpenCL or CUDA, which may result in a developer having to spend more time optimizing things. Perhaps.

eCycles runs (cycles :slight_smile: ) circles around regular Cycles. eCycles Cuda is as fast as RTX Cycles. The role of the developer is incredibly important on top of all this complexity.

I don’t believe in one hardware/OS/software solution to “rule them all”. It depends on the job at hand, and the software used. And on your OS preferences as well, of course.

3 Likes

It took me a sec to get why you would want me to export to uncompressed, being that how efficient a system can compress footage on exports is part of the package. But, as we both know, compressing the footage is just one part of the export package. But after running the Window’s Prores to uncompressed export and getting almost identical times as the MacOS OpenCL times for H.264, I did add 5 Gaussian Blur nodes each (GPU based), so “Quick Sync” (as you referred to earlier) wouldn’t be tainting the results.

So here are some updated test and honestly I’m surprised by the gap narrowing. :slight_smile:

All hardware, software, and footage as previous:

MacOS: Metal (5 G Blur Nodes)
Prores to Uncompressed - 00:29
Prores to H.264 - 00:29 (weird that it was identical to the Uncompressed time)

MacOS: OpenCL (5 G Blur Nodes)
Prores to Uncompressed - 00:31
Prores to H.264 - 00:31 (same weirdness of compressed vs uncompressed equaling time)

Windows: OpenCL (Without) the 5 Blur Nodes:
Prores to Uncompressed - 00:25 Like I said, this gave me almost an identical time as the MacOs OpenCL H.264 export from my first test (24 seconds) which didn’t include the Blur Nodes.

Windows: OpenCL (5 G Blur Nodes):
Prores to Uncompressed - 00:44 min/sec

Windows: OpenCL (5 G Blur Nodes):
Prores to H.264 - 4:48

Things I’m taking away from these personal benchmarks:
Some of the software’s optimization is helping for sure, as seen in other renders, and the playback speed I notice when working on the two systems, but it’s not the whole story. As for Windows not taking advantage of Quick Sync, like it is in MacOS, this is causing a big discrepancy in render times when having to compress footage is concerned.

I will concede that the Windows gap isn’t as far off with certain factors taken into context. But The underlying factors that Window’s hasn’t addressed, in the Film and Video industry, are still present and do cause a significant performance drop since rendering the footage out to a manageable codec is part of any workflow.

Tim

1 Like

QuickSync is not a Mac vs PC thing it’s a Workstation vs Consumer CPU thing. Workstation class CPUs from Intel do not have QuickSync and of course no AMD CPUs have QuickSync. The iMac Pro and the as yet unreleased Mac Pro will not have QuickSync so maybe get on your high horse, take a ride to Intel and AMD and tell them to put native H264 encoding on their workstation chips. I’m not defending Blackmagic for not supporting Quicksync on PC, they should but it’s not a Mac vs PC issue it’s a Resolve issue. *I’ve been reading the Resolve forums and it seems Resolve on the PC does have Quicksync support so not sure why you’re not getting good export speed? Maybe it’s a Resolve Studio feature?

At a professional level most Resolve suites would be exporting uncompressed, ProRes4444, or the equivalent Avid DNx variant so the lack of QuickSync is moot. The compression for output is done down stream.

It appears you are also using the free Resolve, as I understand it there’s a lot of CODEC stuff missing in the free version compared to Studio. I have an AMD processor so it doesn’t have quicksync but I have never experienced anything other than very fast export times at least 2-3x faster than real time at 4k. I have no explanation as to why your Windows export times are so long for 1080p.

Here’s a recent enough Mac vs PC benchmark against several PCs vs Pro Macs that sheds more light.

Puget Systems have a benchmark between Vega and recent nVidia GPUs and while Vega punches above its weight Puget do not put AMD GPUs in their system due to power draw and heat. I always take these statements with a pinch of salt because nVidia are known to twist arms. The data is on their site.

Benchmarks are only valid for a short period of time as later software revisions can make them redundant very quickly. All I can say when editing 4k XAVC files from our Sony FS7s Resolve Studio flawlessly edits and grades the footage on 2x 1080TIs. I don’t feel I need more hardware to get the job done, it’s only 3D work that makes me want to build a faster PC.

1 Like

Thanks for the thorough reply, all great things to consider.
I guess never really having to worry about quick sync left me in the dark about it.
These test where performed on Resolve Studio so not sure on that one either. Certain compression codecs very well may be our culprits here.

For you, someone who sends their final straight to distribution where the distributor will transcode to needed formats is idea, and I’m jealous.
But I do almost exclusive live events, along with corporate editing, the former is where exporting to H.264 and quick turnaround times combined with real time playback (I seriously couldn’t do my job without the later) are requirements. Maybe that’s why I see so many Mac workstation combined with FCPX being used for event work?
Most times I’ll be cutting highlight footage while the event is still in progress, so it’s available for purchase or up on the web in a reasonable time. I don’t have time to transcode the footage coming in to a reasonable codec. This requires me to work with the footage with real-time playback while working on a laptop, as is. So speed and performance on mobile machines is a huge priority.
So yeah, you’re lucky getting to work with higher end codecs all the way through your workflow.

The times when events are using FS7 or FS5 exclusively, and I’m not having to work with multiple codecs from multiple branded cameras, no one dares uses ProRes over H.264 for fear of disk space while out on course. (Not taking into account wired vs wireless transmission needs)

So what my rambling is getting at, and at least for me, price vs performance together with price vs professional capability is pretty big in my little corner of the production world.
That’s why I and so many others are exclusive using Macs in paid work, and I’m only now using Windows during my transition into the 3D world.

Tim

Regarding performance differences on operating systems, each system has its own strengths based on the tradeoffs chosen for its demographic.

Linux tends to work better with server workloads, but its graphics subsystem is a mess. Mac OS has the best audio interface. Windows has the best support for graphics.

Comparing gaming performance on Mac/Linux vs Windows isn’t exactly fair, because AMD/NVIDIA have an entire team essentially rewriting parts of major game titles to run/benchmark faster - on Windows.

Developers won’t be optimizing that much for less relevant and more complicated APIs. You can even see this with D3D12, which often lags behind D3D11 implementations, because D3D11 is more important, but also because it’s easier to work with.

Metal is a really nice API, but the economics of supporting it well don’t really play out in most cases.

Vulkan is sort of a worst case scenario here: Difficult to work with and fairly irrelevant. Add an abstraction layer on top (MoltenVK) and you have the perfect recipe for inferior performance.

It’s ironic that we now have far more graphics interfaces than hardware vendors, when the point of graphics interfaces was to get rid of all the hardware-specific interfaces.

3 Likes

Except that Apple released Metal before Vulkan so the argument falls flat. MS does nothing to support Vulkan but due to the nature of how their OS is designed they do nothing to impede Vulkan support either, but I can count applications that use Vulkan on one hand and still have fingers to spare.

Metal will get used. It may not get used by all cross platform applications but Maxxon, Autodesk, EPIC, Unity, Adobe, The Foundry have all taken steps to use the api. Whether Blender sees the benefit to using the api doesn’t really matter, what should matter is does Blender want to cater to Apple users or not. If they do then they have no choice but to support Metal, period. Even if they use MoltenVK they will still have to do work, test, etc.

I don’t agree with this. You’d have leg to stand on if CUDA worked on other hardware like AMD’s hardware. But that fact is you are trading one lock-in for another. People complain about Metal but then complain about not having CUDA because they are locked into Nvidia’s proprietary api. Does no one see the irony?

It seems people are very choosey about which company they want to be locked into.

Yes. This.

If Apple made the hardware, I could understand having an exclusive API.
Nvidia makes the hardware, so they make the API.

While I may prefer an open API, I can understand when a hardware vendor makes a closed API for their hardware

Apple is making a closed API, while not making the hardware.

That is the difference.

Based on what Brecht and Ton have said, the plan is to use Molten VK & Vulkan for Mac. But more than 1 source source has said that Vulkan is optimized for gaming; and there are hints that there may also be some work done on Vulkan and VK as well as the conversion so maybe the performance won’t be reduced so much.

Apple knew that the Vulkan specification was being worked on, just as it was known that Microsoft was working on DirectX 12 which is Microsoft’s counterpart. Even if they don’t have an official Vulkan version, they actively prohibit it. And that’s the issue.

Every developer knows the benefits of using the official APIs, including the Blender developers. But it requires a massive investment and being forced to make such an investment is the issue! MoltenVK will also require quite some testing and for sure some adjustments. But it is reasonable to assume that the required effort is orders of magnitude smaller.

Hmm… you must have a lot of fingers on that hand then:

As an example, practically every major game engine supports Vulkan, with ‘id Tech 7’ using it exclusively.