The OP explicitly referred to the VSE or the compositor and this is why I proposed a VSE-based solution. I agree with chip that the compositor offers more control.
Ideally, though, my first choice would be neither the compositor nor the sequencer. If I’m not super lazy (which I usually am), I’d simply use the 3d viewport!
Import images as planes (or onto planes) and animate the planes, move them in and out of the frame, toggle plane visibility of each on/off, create outlines, add other background elements such as gradients or moving textures, add captions and/or titles, add other graphic elements etc. For optimal control, some of the elements could be brought into the compositor for additional tweaking. Of course, this last step is totally dependent on the time available to complete the project.
FOSS-based motion graphics! 