Looking at the lighting situation “as though it really were theatrical lighting on a real stage,” I would ponder that maybe the whole situation you’re trying to achieve is more difficult than it needs to be … especially given this is computer graphics, not reality.
The thought is (pardon me, I am by trade a computer programmer…) break the problem in half. First: “part (a) of the problem is easy … making sure that both characters receive an adequate amount of three-point light.”
Then, separately: “part (b) consists of achieving a nice interplay of shadows that conveys the characters’ shape as they are moving around.”
First, deal with part (a) by seting up the simplest (area lighting, etc.) way to light the scene. Then, deal with (b) by using (buffered, if you can possibly get away with it…) shadow-only spots to generate, in a separate channel, shadowing information. Now, you have two entirely separate “tracks” of information, perfectly aligned, that can be composited in (by whatever node network you please) to modify and/or to attenuate that original lighting solution.
In this way, the lighting solution no longer has to consider the detail-conveying shadows solution at the very same time, even though the two of them join forces to produce the final effect. Having generated “first one, then the other,” you can then “season to taste.”
I can’t deny that this approach seems “more|too complicated.” But . . .