To answer your question, first we need to distinguish between:
Game-world actions by the player within the game-world … “what you perceive that you are doing.”
Game animations … "what you actually observe with your eyes and other senses. (Also: the movements that you make with your fingers, hands, and body against physical devices.)
Game programmers construct elaborate computer simulations of their game-world (which are, in fact, “the entire manifestation” of that world …). They write logic which somehow maps “every physical thing that a player can physically do … touch the screen, tilt the machine, shake the thing” … into a corresponding event within that [purely imaginary] world. Most of these “game-world inputs,” of course, result in a “game-world output.”
And then …
… they determine what the game hardware should next display, for this-or-that ‘game-world output.’ That’s where “animation” comes in. The game engine first determines what the next [high-level] response ought to be, then the [lower-level] animation system (be it “2D” and/or “3D”) carries it out. During the course of any game, many such animations are taking place simultaneously.
In 3D games, “the game-world engine is very much ‘3D aware,’” in that it’s fully aware of the [game world] relationship of the various actors in [game world] 3D space, and the visual-effects subsystem [somehow …] knows how to translate this to the “purely-mechanical pixel space” that the happy user sees on his/her screen.
So, conceptually, there is a game layer, which creates and manages an entirely-virtual game world, and a display layer which manages the presentation on the hardware (and interprets, for the game layer, the user-interface actions that you generate). The two layers are loosely connected.