My suggestions shortly: Make the characters separate from the background and forms read better. More action to tell a story and give the characters reason for being in the image and in the location they’re shown, which is immediately apparent to the viewer without second guessing.
The biggest issue I had when I first saw the thumbnail and the whole image was to figure out the forms and the orientation of the characters because of the lack of value contrast (which is why the second picture works better). But after looking at it a while, I think the bigger issue is the story.
It took me the longest time to figure out even a hint of the story. The image is in a dynamic angle, so the poses could be dynamic, could have the robots doing something. The two are just standing there, but whatever they do in that location and why they’re there should be immediately apparent to the viewer. Are they guarding something, protecting something perhaps. apparently there is a fight of sorts but nowhere near this place, nothing to worry.
Character design could reflect their purpose.
Foreground character is holding a poster some kind, it could be handing it to someone (who is maybe off camera), drop something, signal something.
Top image and the thumbnail:
Values get mixed with the background and the forms are difficult to read/differentiate also from each other. That flattens the image and with things like the wall corner and the left arm of the foreground character going a bit of tangent with each other, it gets confusing.
Also not sure if you noticed and if it’s intentional somehow, but the numbering on the back wall kind of continues with the chest plate number because it’s aligned quite perfectly in perspective.
Middle image:
Having value separation between the characters and the background helps to keep them separate in space. There are lights in the background so plenty of excuses to have it lighter because of “fog” or “dust” for example.
Bottom image:
There could be more movement. For example, there could be two bots going towards the left side and they could be escorting something/someone, dragging something, moving alongside a vehicle, and the foreground bot could turn and look more towards the camera and go “what you’re looking at!”, or it could give a stop signal/gesture to some people on the back left of the camera.
Where the two are going. Maybe could build a fence with signs that reveal the nature of the things. “Hydraulics Anonomous”, “Police”, “Prison”, “Wall-Wart” or somesort.