Display referred has a minimum and maximum value that relates to a device. Think of it as using the light ratios of your screen. What happens with your display when you go brighter than maximum? You can’t. It is essentially meaningless. Now imagine trapping all of your work within those light ratio limitations.
Scene referred is an entirely different creature. Look out a window on any given day. What is black? What is white? They don’t exist. Just ratios from some low value to some massive value. On another day, or another planet, those ratios could be radically different.
Now think of a camera (device referred) capturing the same scene above. You have to make a calculated effort to figure out what you intend to make white, black, and what you seek to be for the middle of the camera’s dynamic range of recording.
In the last context, we undergo a transform that takes us from the scene referred domain (outside the window) to the device / display referred domain (camera / display). That transform is critical to keep clear in your head; there is always a dividing line between the scene referred domain and the display referred.
There are good reasons to do all of your work on scene referred (scene linear) data values, hence why the transformation in modern workflows only happens on the view, not on the data until you bake it into an encoded image.
If you manage to get the scene referred versus display referred division in your head, you are well on your way to removing all confusion. From there, it is much easier to build up knowledge, including fully comprehending colour spaces, HDR10 / Dolby Vision technology encoding, etc.