3D tracking of moving object with static camera?

AnyMation · February 20, 2012, 1:25pm

I have been using the Sebastian’s tutorials, but I need a slightly different “spin” on things: is it possible to track a moving object (only) from a static camera, i.e. no movement relative to the environment.

The application is to be able to track and eventually rig a face - but in a more static camera environment, say like the equivalent of characters in a studio.

I fully appreciate that the face will not have total freedom of movement, else some of the markers will be lost, but I want to move one step beyond the typical rigid, static faces.

1) I tried by letting the camera track the static background. Next, I tracked a moving face. Other than the background, I only seem to be able to get a bunch of face object empties out - but they are all in one plane (i.e. perpendicular to the camera). That is going to complicate rigging a face too much…

2) Next, I tried by cheating: to track only the face with the camera (instead of the face as a moving object in a static environment). Apart from the the camera solving providing me static points for the face with a moving camera (I understand why), I ended up the the following problems:

The depth of the control points in the scene were inverted: e.g. the nose tip point was furthest back
The scale along the camera Z-axis was far too long, i.e. the face would have to be VERY “deep”

The one thing I have not done (I’m trying to avoid it), is to start off by having the face inside an environment while the camera is moving… and then carefully put the camera and tripod down… and then to capture the face movement. (But I am trying to avoid moving the camera at all)

Can anybody help?

Thanks!

thezodiac · February 20, 2012, 6:32pm

I haven’t fully explored the possibilities, but here’s the basic setup…

From the Movie Clip Editor, go to the Properties panel on the left side (N-key), and add an Object under the Objects section. Now add Tracking Markers on the static object accordingly and track them. (If you already have Tracks under the Camera object, select the Camera, then select all the Tracks, next Track -> Copy Tracks. Finally, select the Object and Track -> Paste Tracks, all in the Movie Clip Editor).

Then in the Tools panel (T-key), click the Object Motion button in the Solve section. Switch to Reconstruction Mode, select two Tracks and tweak the Scale until it looks good in the 3d View.

Back in the 3d View, add a Cube, move it to line up with the Tracks in the 3d View, then add an Object Solver constraint. Object will be the Object from the Movie Clip Editor. leave Camera blank. You may need to play with the two Inverse buttons to get it to work correctly.

That will get your Cube (or whatever) to follow the static motion.

What comes next is speculation, but you could add another Object in the Movie Clip Editor, tracking the “dynamic” parts of the face. Don’t Solve this time, instead just switch to Reconstruction, select the Track(s) and in the Tools, Link Empty to Track. A bunch of Empties will be created with Follow Track constraints automatically applied (the Track field is red, at least in my version, don’t quite know why, but seems to work anyway). Under Depth Object in the Constraint, use the Cube, and the Empty will be projected onto it. I believe the Depth Object field was added specifically to allow Sebastian to rig facial motion (he wrote about it on the Mango blog as I recall).

benu · February 20, 2012, 8:34pm

Tracking something like a face is usually referred to as Motion Capture, and involves tracking each point as its own autonomous-ly moving point. For standard tracking, each tracked point is assumed to be static in relation to all other points. For object tracking, each point on the object is assumed to remain “rigid” in relation to the other object-tracked points. But for “mocap,” none of the points necessarily correspond with any other points. The face contorts, the muscles shift around, etc.

To solve this sort of motion, you need multiple cameras filming the same action (“witness cameras” is what they’re often called), then the cameras are calibrated together and each point now has a few tracked positions for each frame and a solution can be obtained. From what I understand, blender can’t do this yet.

AnyMation · February 21, 2012, 2:24pm

Hi thezodiac,

Thanks a lot for this. I did what you said here and it worked - or as far as i could gather from what it is supposed to do

I am able to get a bunch of tracked empties that perfectly line up with the dots on my face in the background video, even while moving.

The only problem is that the empties now project onto the front of the cube, which is still a flat surface - no depth information associated with the empties. I don’t think you meant that, in the “real thing” the cube should be replaced by face model - as the face is not going to be animated.

Similar to the camera moving around in a background image, I am sure that the algorithms should be able to extract meaningful depth information from a moving object such as a face.

I’ll see if I can find more on Sebastian’s Mango blog…

Thanks so far!

AnyMation · February 21, 2012, 2:27pm

Hi benu,

I fully understand what you mean but, as I mentioned in my reply to thezodiac, similar to the camera moving around in a background image, I am sure that the algorithms should be able to extract meaningful depth information from a moving object such as a face.

I am hoping it can be done without stereo video recording.

thezodiac · February 21, 2012, 10:43pm

That’s correct. I didn’t explicitly say it, but ideally the “Cube” should be a mesh that most closely matches the face that your are trying to capture. The Cube was just a demonstration of how to get the idea to work. In Sebastian’s video (link below), note the large black dots at 0:16. Those are the Tracks for the Object Solver constraint: the head band, nose, and ear markers that remain static (relative to each other) as the head moves. A mesh is created to match the shape of the face, and you can see Empties aligned with the cheeks and above the eyebrows. Those are used for the Link Empty to Track and used somehow to influence a face rig and deform the face mesh. The textures are then mapped along the face mesh and deformed according to those Empties, then superimposed over the live action footage.

thezodiac · February 21, 2012, 11:03pm

On a side note, the facial mesh needs to be exact (or at least close) because there’s no way for a computer to be able to determine the geometry from a static camera. The knowledge of how a face is supposed to be shaped is crucial. This optical illusion very nicely shows the effect. Even with all the visual processing of the human eye and brain, you still can’t see what is really happening until there’s a different camera view, so how could a computer do any better?

AnyMation · February 22, 2012, 12:58pm

Thanks thezodiac, I think I eventually got it after meditating a bit more about the mail you sent yesterday, but your further explanation also helped.

I’m guessing that one can do it as follows:

one carefully has to choose the locations of certain markers that will eventually become IK target bones
those markers eventually end up on the carefully placed face mesh itself…
then - carefully place a head bone, with other bones that “grow” from it to certain control points on the surface of the mesh
next, let these bones and head bone be IK-constrained to the IK target bones
now, with the face mesh parented to the head bone
the mesh - onto which the empties are mapped (and with whom the IK target bones move) - then follows the head bone, who follows the IK targets

In effect, if everything works out right, the mesh is dynamically moved by the parent head bone, with the targets (and IK targets) staying on the surface of the face mesh.

I hope I am explaining myself clearly!

(And I hope the markers can actually stay on the surface of a moving object! That’s because they actually cause the mesh underneath them to move. So, hope this does not become unstable, but I cannot test right now…)

PS: those extended antennae from the headband obviously helps increase the perspective and resolution of the face.

But wait! After all that explanation, I noticed a tell-tale detail: notice the moving lamp between 16 and 20 sec! That’s a similar effect I got when I camera-tracked the object instead of the terrain!

Mmmm - some more thinking may still required! I wish Sebastian would break the silence about how he did it!

Thanks!