HTML is a little overkill if all you want to do is show positioned text and images. It’d be much easier , if you created your own format and parsed that. Something like:
T(20, 30):Hi Blender!
I(40, 60): /Users/Me/MyImage.jpg
T(100, 20): This is my image!
Where T would be text that you could output to the window, and I would store the location of an Image you want to display. The part in parentheses would store its location. Then all you have to do is read in the created file, scan through the lines, and take the appropriate parts of each string and apply the action you want. You could add more things like text color easily (even things like padding and background color wouldn’t be horribly hard).
As HouseArrest’s link shows you (if you go to the next page), Python has built in parser capabilities (it has an html parser as well).
To really be able to use what you get out of any HTML parser, though, you’re going to have to implement a DOM model, write lots of rules, define lots of behaviors, and be quite comfortable with Python and the API. This is not an easy task at all, and quite frankly, Blender does not easily have the capabilities to do everything you need visually (like display fonts that aren’t the built in bitmap one on its script window, which means your stuck with a couple sizes of text). I suppose in theory it does have the capabilities, though, if you really know what you’re doing, and you can put in the work to figure stuff out.
For right now, unless your really in bed with OpenGL, Python, and the API I’d strongly suggest making your own file format like above. Its going to save you so many headaches over trying to implement an HTML viewer that follows all the little rules and such of rendering HTML.
Edit: An external renderer could work, but it depends on whether its drawing to hardware or software, and whether or not you can get access to the buffers its drawing too, and then translate them to be drawable with the API or somehow blit it onto the script window.