It isn’t you not being clear; we just have different contexts and that can make communication tricky at first. Terms are another big problem, which is why I try to stick to well defined terminology. That might be tricky for some as well, but in the end, we all end up better off when we use better terminology.
There is much here to unwrap. So I will try to tackle it point by point to prevent assumptions.
other compositing softwares
Just like file formats, not all software is equipped to handle things in a contemporary way. While scene referred workflows are possible in Nuke, Houdini, Fusion, and possibly others, the nature of the colour means that it is up to the pixel crafter to make sure their work is correct. That is, the software that can do scene referred workflows, don’t enforce it because it is impossible to know what their data means and what they are trying to do. Photoshop, After Effects, and other software cannot properly handle scene referred workflows due to legacy historical issues among other reasons.
usually have options to decode a footage before
Once you realize that all we are dealing with are numbers, you can see that we must always deal with the data before. Some software might make certain assumptions about the data in secret (think about Blender and “default” for example, or what “sRGB” means, etc.) but that ends up making for dumb software that is rigid and breaks. Better software permits the informed pixel pushers to exert full control on their data inputs. In the above post, “decode” was used to describe colour management; it is a series of transforms that happen behind-the-scenes.
In the case of outputs, similar issues exist. Raytracing engines, or anything dealing with photographic / physically plausible scenes need to have strict control over the creative and technical outputs. This ranges from everything you see on your screen, to how the data ends up encoded in particular files. Compare for example, Blender’s absolutely broken DPX output to the DPX output configuration in Fusion. All contemporary compositing software permits controlling the view for display output, as well as finely grained file format output.
bypass the step and apply the colour management to the rendered frames only leaving the footage “as it is”?"
It cannot be bypassed.
Even the most simple example of taking a set of data to your display is complex under the hood. Think about rendering for example, in the case of a raytracing engine.
In a raytracing engine, we aren’t shooting a single beam of light from the screen into the scene, but rather three. Each of the reddish, greenish, and blueish lights are a specific colour in colour science terms. What colours are they? Are the other data sets in our scene aligned to them?
Now think about the return trip out. What are the lights in the display? Are they the same lights of the reference scene? Further, did we map the light intensities in a way that is creatively acceptable?
Consider that every single Apple product since 2015 now has different coloured lights in their respective displays as compared with the assumptions made in your software. Were the lights transformed correctly so the colour of the three lights align so the creator / audience has a chance at WYSIWYG?
All of this ends up with the simple fact that smart software that tries to make assumptions for the pixel pushers is stupid software; it is rigid and breaks, crippling the pixel pushers.
When someone asks about the “Why bother?” the answer is that proper process will:
- Enable the pixel pusher to have control over their work
- Save time
- Save (in some cases) money
- Elevate the work
- Open up new creative options
TL;DR: Good software permits pixel pushers to control their inputs and outputs with granularity. It is up to the pixel pushers to embrace that and expand their knowledge to control the situation and avoid making rubbish.