Making a spectrogram with Geometry Nodes

Hi all

I’ve been trying to make a functionnal spectrogram in Blender for about a year using geometry nodes, and it’s nearly finished.

Here’s what it looks like in b3.6 :

The principle is quite simple : I’m using a spectrogram image as a height map, and I offset the UVs at the right speed to match the music.

Here’s the image used in the video above :

I’m using FFmpeg to generate the spectrogram image with the command showspectrumpic.

I’m currently working on an addon to go with it, hopefully I can release a v1.0 this year. Both the addon and the node setup will be available for free.

I’ll try to share updates here, along with details and explanations on my node setup. This was a very fun project to work on, I’m eager to share it with the community.

Critics and questions are more than welcome. Also if you have ideas on functionnalities to implement feel free to share.

18 Likes

dammn …that’s a neat trick

welcome to the forum @holybobine !
Really great setup, well done !

I’m wondering what results it would bring on an image sequence to analyze say luminance.
If GN can read image sequence, with simulation nodes it might be possible to make something a bit similar to your setup.

Ok I’ve overcome a BIG road block.

Since the beginning, I’ve wanted my spectrogram to display volume in dB somewhat reliably.

In order to do that I had to understand the relationship between :

  • the volume of the sound being played (in dB)
  • the luminance of the spectrogram image (which is used for displacement)

My idea was to retrieve the peak volume of the sound, store it in the geo node, and use it to properly calibrate the displacement.

Step 1 : Volume

Getting the max volume was pretty easy. I’ve used FFmpeg’s volumedetect function, which outputs details about the sound’s volume, including a “max_volume” value. I had to parse and format the outputed text, but I was able to detect and store the value in Blender.

In order to test this, I’ve generated multiple variations of the same audio file, each normalized to a slightly lower volume.

In total, I’ve made files ranging from 0 dB to -48 dB, which I renamed accordingly. Then it was just a matter of running volumedetect on each file, and making sure the outputed max_volume was the same as the name of the file.

The results I got were very consistent, within a +/- 0.5 dB error margin.

Step 2 : Luminance

Before we go any further, I’ll explain the way I’m doing the displacement in Blender.

In the case of my spectrogram, I’m only interested in the luminance of the image to apply a vertical displacement. In other words, I need to convert the RGB output of my image into a float value containing the luminance.

For now, let's simply add the R, G and B channels together, and divide the result by 3.

There is many ways to calculate the luminance of an image based on
RGB values, we’ll keep it simple for now.


I now had to test wether the outputed luminance was consistent with the max_volume I obtained earlier.

As explained above, I’m using FFmpeg’s showspectrumpic function to generate the spectrogram image.

Using my previously created sounds, I’ve generated spectrograms going from 0 dB to -48 dB. My idea was to compare the maximum luminance of the image (aka, the highest point of displacement) with the max_volume value.

In order to test this I’ve created multiple spectrograms in the same scene, each using a different audio file, and compared the max_volume with the luminance :

As you can see, the luminance is ALL OVER THE PLACE. I was expecting a straight line, where the max_volume is somewhat proportionnal to the luminance, kinda like this :

max_volume max luminance
0 dB 1.0
-3 dB 0.95
-6 dB 0.9
-45 dB 0.55
-48 dB 0.5

And so began my wonderful quest on the path of luminance…

Step 2.a : Color Mode

First problem, when generating the spectrogram, FFmpeg applies a “color mode” to each pixel.

From what I understand it takes 2 steps to get a pixel color :

  1. it first calculates its luminance based on the sound volume
  2. it then looks up a color table set by the user, which determines what color the pixel should have
    (you can see the color tables on the showspectrumpic github page)

In other words, FFmpeg applies a gradient on the pixel based on its luminance, which messes up the propotionnality with the max_volume.

From here I used 2 different color modes to try and get consistent results.

“intensity”

This is the default color mode.

While looking quite cool, I’ve had a hard time extracting consistent brightness levels from it.

Sadly, this makes it unreliable when combined with the max_volume value.


“channel”

I think this mode outputs the raw luminance of the pixels, without applying any color mode to it.

It looks a bit dull, but should be perfect for what I need to do.


I ended up using the “channel” color mode, since it’s closest to what I need to achieve.

Step 2.b : Blender’s color space

Another culprit in my quest for luminance.

By default, Blender uses the "sRGB" color space when an image is imported.

While it looks fine on most images, the problem is it alters the pixels values, hence messing up the luminance.


To fix this, I’ve made a script to change the color space off all the images in the scene :

import bpy

for i in bpy.data.images:
    i.colorspace_settings.name = 'Linear'

The “Linear” color space outputs the pixels as is. From what I’ve tested, “Raw” and “Non-Color” work just as well.

Step 2.c : success

With all that done, I was able to get perfectly consistent results between the max_volume and the luminance, as shown here :

With a bit of fiddling in the geo nodes, I’m now able to normalize the displacement to 1.0, whatever the sound volume !!!

Step 3 : cool colors

Now that I’ve obtained consistent results with the “channel” mode, I’d like to get the same with the “intensity” mode. In order to do this I’d have to find how to extract proper brightness value.

For now the formula used above doesn’t work :

(R + G + B) / 3

I tried a formula to convert RGB to YUV, to get the Y component (which corresponds to brightness). The formula looks like this :

R*0.299 + G*0.587 + B*0.114

It doesn’t work either, so I think I’ll have to dig deeper in FFmpeg’s github to understand how the pixels colors are encoded, and try to come up with another formula.

Either way this is only for style points, so for now I’ll focus on developping the addon.

Thank you for reading !

edit : I’ve wrote this way too early in the morning, so forgive me if there’s any mistakes…
6 Likes

2024 UPDATE !

I made a lot of avancement since last time, mainly technical choices regarding both the geo nodes and the addon. In this post I’ll talk about displacement and gradient color.

1. Displacement

In my previous post I go on and on about color modes, linear vs sRGB, luminance, intensity blabla…

Turns out I did a full 360° and went back to the default sRGB color space. I owe a lot to this post where user gandalf3 explains how Blender implicitly converts the RGB channels of an image to a single channel, allowing us to use it as a float value.

In essence, Blender extracts the image luminance using this formula :

Y = 0.2126*R + 0.7152*G + 0.0722*B

Here’s the example gandalf3 is displaying in his post :

Knowing this, I ended up going back to ffmpeg’s “intensity” color mode (style points, yay) AND chose to keep Blender’s default sRGB color space. From my testing, this gives very consistent results between the audio volume and the displacement (as in lower volume == lower displacement)

The result is way better in my opinion, and we can better “see” what is heard in the audio :

This comes with a drawback though :

Since I don’t quite understand the relation between the audio volume and the displacement I’m applying, the dB scale displayed will surely NOT be accurate. I might fix it in the future if I find a better solution.

2. Gradient

Now that I settled on a solution for the displacement, I was able to implement a tool in my addon to control the spectrogram’s gradient.

The shader is quite simple : I use the normalized displacement (here the “spectroHeight” attribute) which I multiply by the gain, and then mix with a mask (so the extruded sides of the spectrogram remain black).

I then implemented the color ramp node in my addon using this code (I’ve set the name of my color ramp node to “STM_gradient”):

cr_node = bpy.data.materials['STM_shader'].node_tree.nodes['STM_gradient']
box.template_color_ramp(cr_node, "color_ramp", expand=False)

And finally built some functions to reset the gradient and give some presets to the user :

gradient_addon_demo

Here’s my function to reset the gradient :

import bpy

def reset_gradient():
        print('-INF- Reset gradient')

        cr_node = bpy.data.materials['STM_shader'].node_tree.nodes['STM_gradient']
        cr = cr_node.color_ramp

        #RESET COLOR RAMP
        #Delete all stops, starting from the end, until 2 remain

        for i in range (0, len(cr.elements)-2):
            cr.elements.remove(cr.elements[len(cr.elements)-1])

        #move and set color for remaining two stops
        cr.elements[0].position = (0)
        cr.elements[0].color = (0,0,0,1)

        cr.elements[1].position = (1)
        cr.elements[1].color = (1,1,1,1)

I’m very happy with the result, I think the displacement looks great when combined with the music, you really get to “see” the sound you hear. And I’m very thankful Blender offers a Color Ramp template, it’s very cool to play with the gradients in real time directly in the addon.

I’ll be back soon with another update. Thanks for reading !

4 Likes

I noticed the first video I posted went missing, so I made another demo :

240122_demo_stm

You can see me use the 3 main controls that manipulate the image UVs :

  • Lin to Log : apply a log function on the V axis so it better displays low frequencies
  • Freq Min / Freq Max : remap the V axis to display the desired frequency range
  • Audio Sample : scale and offset the U axis to display a sample of the spectrogram

I’ll go into more detail in my next update.

Thanks for reading !

4 Likes