For sky modification you probably need a mix of keying (luma, channels etc depending on video) and roto. You are right that tracking is useful here, because you can attach masks to trackers and remove the tedious work of animating camera wobble manually. In Blender this kind of work is a bit contorted, but just to lay out a general idea I would do it somewhat like this:
- pull some keys (from luma, different color channels with bigger separation, maybe try chroma key) and see which areas will work with keyed mask and which ones need additional roto. Mix different keys together using rough masks to get the best key for each area. You rarely get one good key for whole image.
- depending on camera movement track the footage and create a transform;
- attach created transform to masks and animate masks where necessary (for example if there are moving objects);
- mix masks with keys using for example Maximum operation;
- use the resulting mask as a factor in color operation or better yet, do all color ops you want for sky and in the end use the mask to merge modified sky over unmodified background (in Nuke this is called keymix operation and it helps to prevent problems which creep out when all color ops are done through mask)
Some more ideas: sometimes it is useful to stabilize the image before roto, then do roto, then re-apply transformation to masks. This is helpful because you get a better idea if your masks actually stick to their focus objects without slipping.