Yes, but how are you doing the perspective warp in the firstplace? This is where it all begins, you must get the output of that operation in a way where the area outside the screen image is transparent.
For moving camera, the easiest way is to do a four corner pin (it is called cornerpin in most apps) to four corners of the tv. The corner points can either be animated by hand (tedious) or tracked with 2d trackers, planar tracker etc. Planar tracker is usually the best option because it does not need the four corners to be visible at all times (it tracks the whole surface with autoplaced features). Sometimes a full 3D track will be helpful, especially when the whole screen can be osbcured in some frames.
The mask part is what allows you to change the shape of placed image (output of cornerpin is rectangular, screen shape sometimes is not). This should also be tracked to the image exactly the same way as screen image itself. Sometimes it is easier to apply the mask to the image before the cornerpin operation, because this makes sure that the mask is locked to relative position on image and there is no sliding.
If there is something that obscures the screen during the clip, you need a mask for these foreground elements also. This is where the greenscreen part can be handy. If the screen is green/blue then you can key it and get a mask for foreground elements. With simply a black screen (or worse, some totally differently colored image) you must roto (animate the masks) the foreground elements manually and use these masks to place the FG on top of replaced screen. It can be tempting to cut a hole in the screen mask with FG element masks, but I find it more logical and easier to follow if the shot is composed back to front by overing each layer on top of another: original clip > screen replacement > FG elements