Each tile in the scene is rendered independently, so once the GPU finishes all its tiles it has to wait for the CPU to finish its share of tiles. You can adjust the size of the tiles to be smaller so that the CPU will finish around the same time as the GPU, but tile sizes are a trade off: CPUs tend to do better with small tiles, GPUs tend to do better better with large tiles. If you want to use both you’ll have to find a size that strikes a balance.
Small tiles are not good solution because this slows down too much the GPU rendering, it’s faster to render on GPU alone. Reducing threads to 2 seems also be one way reducing the problem. Maybe combining these both can make GPU + CPU rendering faster, I need to try.
I still feel core of problem is the way how cycles behaves. You can reduce problem, but waiting for cpu will still cause slowdown and I imagine here would be place for easy fix to improve many render times.
To my suprise I was able to find sweet spot (16x16) where GPU and CPU work perfectly together. Didn’t expect small and large tile size be equally fast on GPU especially when size between is slow.