Has been solved and narrowed down to masking issues.
Later I will use ShotCut to render video to stills, will then import the JPG stills in place of both clips in the nodes editor, and I will report back if the issue is resolved. In the meantime, feel free to share your theories!
EDIT: Performance suffered with the above adjustment.
Pay notice how Blender waits 51 seconds after “Initializing execution” before beginning to render tiles.
Fra:1 Mem:133.25M (0.00M, Peak 133.25M) | Time:00:00.16 | Compositing
Fra:1 Mem:196.54M (0.00M, Peak 196.54M) | Time:00:00.38 | Compositing | Determining resolution
Fra:1 Mem:196.54M (0.00M, Peak 196.54M) | Time:00:00.38 | Compositing | Initializing execution
Fra:1 Mem:613.29M (0.00M, Peak 613.47M) | Time:00:51.80 | Compositing | Tile 3-8160