interesting, i’m coding 3d camera based software, and so stumbled on this bit older thread
So they build a 3d environment, based upon a depthgraph, then they probably use a combination of a cheap device, kinect, and combined depthgraph and RGB contrasts as surface detection
(since the kinect is kinda raw not high res depthgraph).
(and likely a kinect since they walked with it and so likely didnt use a laser scanner).
Then next thing to do is store the 3d Pixel, from a few frames back.
Retrieve camera tracking (as Blender can), overlay those images and remove noise by averaging the overlays.
Well kinect is fast in creating depthgraphs 30fps, no problems in that area, a bit coding to match its RGB cam. (part of sdk samples)
While Blender was used for 3d traking (i think), using the fusion sdk examples (MS Kinect 3d framework), might be able to do it without Blender , but nice to see it with blender 
hmm something not shown there that it would be very nice to use this tech to combine it with CG models.
ea have a fake car drive in a bussy street. All kind of crazy car stunts could get a lot easier to film this way.
It could even mean no more green screens required.
Its not top tech as i once saw something a bit more advanced doing the same tricks without 3d input
(build 3d from 2d movie, then apply similar effects). But i think its affordable and make-able tech, by any enthusiast with some C# or c++ coding time. And because of the 3D input maybe near realtime should be possible (but would require FAST cpu horsepower still)