I will start by saying that I am not the one to speak to this authoritatively. I am sure other people can follow up with greater detail as required.
For emission back plates, the images are essentially broken.
As textures, it requires more fumbling and jumbling, and it will never quite be a decent result.
An easy way to spot this is to try Filmic with various surfaces using the diffuse reflectance (albedo) values. Frequently folks find that their surfaces are reflecting back far too much light, and their textures end up blowing out.
This is because albedo needs to be roughly close to physically plausible values when hit with physically plausible light levels. If you crawl this forum, you will find more than a few discussions on the matter.
When we place a camera in front of an object to photograph it for a texture, we want as close to a purely linear response recorded that we can get, because in a raytracer, those albedo values are very sensitive. However, because many photographs aren’t carefully handled, the result is an aesthetic response designed to be looked at, not used as data. That aesthetic response is unavoidable unless we use the raw file the camera can record; aesthetic decisions are baked into the hardware and software of every single camera you use, and those decisions are baked into every JPEG you save from them.
So while you could use any old JPEG or random photograph you find out in the real world, it isn’t going to be able to easily deliver the encoded values you are actually needing, despite it looking like an interesting photograph. Same applies for the countless people over the years that tried to massage in a display referred background emission plate, or any other workflow issue.
There are a number of issues in using a low bit depth pipeline that can creep in, not the least of which is the means of which alpha is stored within Blender. Your life would be significantly smoother to stick to EXRs for example, and not dealing with the myriad of problems that crop up as a result of bit depth quantisation issues etc.
Getting back to point, a typical still photo from a DSLR will deliver 12-14 bit depth, and can be encoded reasonably correctly for texture use if you have the raw file. Once encoded, it can be saved into a half float EXR and end up being a great asset to have on hand.
Starting with incorrect assets will simply make your job much harder, gobbling up more time, or ultimately ending up as one more element in the “That feels odd” pile.
A good example as to how a workflow / process can benefit you can be had by looking at the role of a decent view transform can impact your work. How many countless hours have been wasted trying to massage work to fit into the Default view transform? With a stable view transform, “magic” happens with no additional effort; one can drop a CGI model into a scene referred HDRI and have a seamless integration.
Ultimately what you choose to do is up to you; some folks are content ramming away under the Default view transform using low bit depth nonlinear assets, and have fun making content. Good on them! For those that are able to see the issues though, a change in process can be of tremendous benefit, saving time and significantly elevating the work along the way.