After some updates and more training, this is how it looks now:
The actual scene here was rendered with 64 spps, but the training so far only uses 16 and 32 spp examples, so maybe there is still more potential.