The basic process is:
Screen space -> Minimap space -> camera space -> world space
You can get the mouse position in screen space. From that you need to figure out where on the minimap they clicked, where that is relative to the camera, and then where the camera is relative to the world.
If you’re using a RTT, then the UV-coordinates of the plane displaying the minimap are equivalent to the camera space. So you can use a raycast and extract the hitUV.
If you’re not using RTT (for example in CaveX16 as linked above which uses viewports), then you have to find another way to map the click into minimap space. I can’t remember what I did in CaveX16, but I probably put a plane where the minimap was and used the UV’s anyway (BTW: viewports are way more performant than RTT, but there are some limitations with them).
Going from camera space to world space is easy if you have an orthographic camera. Multiply by the width, offset by the minimap-camera’s position, and add them together. If you have a perspective camera, you may wish to convert the camera space into a vec3 and multiply by the inverse camera projection matrix. Then you could raycast from the camera until it hit something. There is a function for mapping from camera space to a world direction vector in the BGE API, but it only works on perspective cameras, so it isn’t a general purpose solution.
You can see the code from CaveX16 here:
It looks like I hardcoded the fact that the camera was pointing world-down so I could just assume
world_space = click_uv * world_size - which is definitely the easiest solution. (Line 167)