Unfortunately to the render-time conscious, this is to be expected in a undirectional pathtracer.
Indoor scenes with small lights will tend to require a lot more samples than say, an outdoor scene or a scene with large lights (on average needing 20,000 samples or more). You will need to wait for implementations of things like adaptive sampling if you want to avoid the use of as many samples.
You can however try to create your own de-noising setup in the compositor to try to remove the remaining noise, but making a good setup can be tricky and most just use a 2D program with filters. Now a Blender compositor setup making use of bilateral blurring or other smoothing setups could actually work here because you only have like the remaining 5 percent of noise to get rid of.