Thanks for all the responses!
The 4K/2160p version uploaded to Youtube was actually upscaled from the original 1080p. I had to do this to avoid losing all the fine detail. Youtube’s compression is very bad, so 4K is similar to properly encoded 1080p. But rendering even 1080p was a pain indeed.
I started working on this more than 1.5 years ago and tried to record the hours I spent on the project. The total is around 930 hours of actual work. Could have been less if I was more experienced from the beginning.
Most of the frames were rendered on my machine. Originally I had a single GTX 770 GPU, and later bought a used GTX 670 in addition. Can’t tell the exact hours, but probably the total time would be equal to at least 1 year of 24/7 rendering for the GTX 770 plus 6 months for the GTX 670. Also I tried to utilize every CPU I could use, but they were really weak in comparison with the dual GPU setup.
For the clouds scene I had to spend some money to rent additional computing power, fortunately not a lot. Otherwise I would have to wait another 2 months.