I don’t think you need to mess around with extremely complicated things like zoom factor compensation. It seems to me that the basic issue is determining how the 3D motions, represented by certain pixels along the orbital paths, are translated into a 2D path that your labels must follow to stay tied to the moving image. While I don’t think this can be highly automated – the 3D->2D translation, e.g. world coord’s to screen coord’s, would be very scene and image-specific – there may be some ways to speed up a manual animation process.
The first step I’d take would be to make a reference pass rendering of the full animation, in which only the orbits are rendered. On the orbits are placed reference markers – could be a small object, or if you can do OpenGL renderings through the camera, something like an Empty. These markers show the labels’ “attachment” points. You should be able to get them accurate down to a single pixel along the orbit path.
In your main animation file, you can create a new Scene (let’s call it “Ortho Labels”) that can have an entirely different camera to render from. Set this camera for Orthographic rather than Perspective. Load the reference pass frames up as a background image/movie and set it for auto frame advance.
Now you can start animating your labels, be they text objects or textures on small billboard planes, by “tracing” (i.e., rotoscoping) the reference points as they move across the BG image. Since you’re working in ortho view, the 2D BG image is matched by the 2D camera view – no perspective issues, no change in label image size. How many keyframes you need will be determined by how smoothly you want the labels to track the orbits.
For maximum accuracy and smoothness, you might try using Emptys to actually track the reference animation, since they are easy to align, then have your labels track the empties via parenting or a constraint, whichever gives you more flexibility. You should be able to get a 1/2-way decent preview of their motion relative to the ref sequence by using ALT+A. If you start by keyframing the first & last locations of the ref points, then move to the middle, then do the middle of those, and so on, splitting the keyed motion up in equal segments as you go, you can perhaps be more efficient than keying every frame right from the beginning.
This probably sounds like a lot of grunt work and it is. That’s the nature of roto work. But it is probably faster than writing, troubleshooting and debugging a custom BPython solution. Great art = 10% inspiration and 90% perspiration, or so I’ve heard. In my experience, that’s an optimistic ratio.
In your main scene you can set up the Compositor to use the Render Layers (if any) of your 3D animation, then another Render Layer from the Ortho Labels scene supered over that, using whichever blending method works best for you. Alpha Over, Screen, Lighten, Add, are all options depending on the nature of the labels and what you want them to look like.
Since you’re working in a completely different Scene, I think you can even set the render size (for Ortho Labels) to 2X your main movie size. The BG image will scale accordingly as long as it’s the same image aspect ratio. Then in the Compositor work, use a Scale node to bring the labels back to the scale needed to match the main animation (50%). This could maybe help smooth out the motion some if necessary. But it might also adversely affect the labels’ image quality. Test first!