A very lot depends on how the animated objects need to interact with the footage. If the objects are strictly “in front,” you can just layer them over the top and you’re done. But if they need to appear to move in Z-space, that is to say, in-front-of and behind things, then some form of masking is necessary.
There are two ways to achieve the masking. One is the video/film technique of “chroma-key” or bluescreen, where a specific color is selected and pixels of that particular color are replaced with information from some other layer. Here, the masking information comes from the video footage.
Another way is to generate your own mask and apply it to the rendered material, shaping the mask (“by hand, if necessary,” and it often is necessary…) to fit the footage.
It is, no matter how you do it, a painstaking and exacting process.