Whether the low-poly mesh perfectly overlaps high-poly one bears little significance. It can be slightly inside or outside. What matters the most is which direction the normals are pointing, and whether the envelope (i.e. ray distance) or a cage allows to capture all the detail correctly, especially between corners.
This is what happens without a cage:
- First, rays are shot from each low-poly vertex¹ in the same direction as low-poly normals.
- They travel until reaching ray distance
- Then they are reversed, and an intersection is sought between resulting vector and the high poly. That intersection point determines direction of normal in the resulting map (i.e. the difference is calculated between what would’ve been the low-poly normal at the original point, and the high-poly normal at this intersection).
This is what allows the baker to find detail, regardless of how well the meshes actually overlap. But, with your new screenshot, it should be obvious then that not all detail can be captured this way when using a flat-shaded low-poly: all that volume between low and high poly would be skipped if you leave the low-poly completely inside like that; but if you scale it up, then you’ll lose the corners as in my example in the previous post. That’s why for such meshes a cage is needed. Of course, you could just shade the whole low-poly cube smooth and bake like that, but then your normal map will all consist of gradients, instead of flat faces with gradients on edges, and you’ll have a hard time projecting other detail, as it will be getting more skewed the closer it is to cube edges.
When using a cage, a relation is established between cage’s and low-poly’s vertices. Rays are shot from cage’s vertices² in the reverse direction of cage’s normals, and again intersection is sought with the high-poly. So a cage’s surface has to be outside of the high-poly, otherwise it’ll either miss it or hit some distant polygon.
In the case of this simple cube indeed the default Auto-Smooth with 30° is enough. But I’m going on the assumption that in practice your meshes won’t be just cubes, where not all faces will be exactly perfectly flat, and not all will be smooth either. In which case the auto-smooth angle may not always be sufficient (i.e. when you want a sharp edge even though the angle is smaller). In that case, you’ll need to mark that edge as sharp; Auto-Smooth respects those marks. But in order to even use those marks you have to enable Auto-Smooth; which angle to use for automatic smoothing would depend on your mesh. You can even set the angle to 180° and rely solely on marking edges as sharp.
Don’t split edges manually, the exporter will do that for you based on which edges are marked sharp (or have split normals), and which have UV seams. In fact, if you do use a cage and try to actually split edges, you may even ruin the mapping between the low-poly vertices and the cage’s. Not to mention that you’d just make editing the low-poly more difficult, in case you do need to make some additional tweaks.
What I meant by splitting is simply having split normals along relevant edges (either marked sharp, or normals edited in such a way that they’re split per-face), not that the edges are physically split. Going by some other software’s teminology (i.e. Max), it’s the same as giving the polygons distinct smoothing groups: in Blender, all faces within such a “smoothing group” are shaded smooth, and all edges on border of that smoothing group are marked sharp (or have their normals split via normal editing tools).
They will be, of course, eventually split, by the exporter or by the game engine (basically, any data discontinuity at vertex level requires this, be it normals, UVs, vertex colors or some other attribute).
You’re welcome 
¹ ² This is of course not done just per-vertex, but interpolated across faces.