Struggling with 2D transformation matrices

You may safely skip this paragraph

Hi there. To be frank, I’m 99.9% sure my current problem is strictly a linear algebra problem (lack of my understanding of linear algebra).
In this sense, it is probably not a python-issue, let alone a bpy-issue, technically speaking. Thus one might ask me to go someplace else with this (e.g. math stackexchange), with some justification.
On the other hand, I’m facing the issue in the context of writing python code which is closely tied to the bpy, so making it clearly understood which results I try to obtain and what actually goes wrong about the results I currently do obtain would be somewhat painful without the python- and bpy-context.
It follows I rather ask here.

What I try to achieve

My code is basically supposed to emulate the way the sequencer-preview window renders a color-strip. The key difference being it draws to the 3D-Viewport (in camera view [numpad 0]) via the GPU-module, it does not draw to the sequencer-preview.
So if you just add a new Color-strip in the sequencer, it will render a screen-filling rectangle in the sequencer-preview (of whatever color the strip uses, by default black).
Of course, things are more complex than that, for the strip has its transform-attributes (offset_x, offset_y, scale_x, scale_y, rotation, origin[0], origin[1], use_flip_x, use_flip_y).

The challenge I’m facing is properly emulating the effects those have on the sequencer-preview of our color-strip (with respect to the camera frame when in camera-view [numpad 0] in the 3D-View).

.
How I go about it

In all brevity, I use a function I called update_shot_verts() to prepare a (length 4) tuple of mathutils.Vector (each of the latter of size 2).
The (2D-) coordinates in this tuple are (somewhat implicitly) assumed to represent normalized (0-1 range coordinates w.r.t the camera frame (basically NDCs). Lower left corner is assumed to be coordinate X=Y=0.
It is also inside update_shot_verts() where I (try to) derrive (3x3) transformation matrices from the transform-attributes of a sequencer strip.
In other words:

  • I start with NDCs (Vector((0.0, 0.0, 1.0), Vector((1.0, 0.0, 1.0)), Vector((1.0, 1.0, 1.0)), Vector((0.0, 1.0, 1.0)),). Note at this point the vectors are (X, Y, W), for the sake of homogeneous coordinates.
  • I derrive a rotation-, scale-, mirror-, and translation-matrix from a sequencer strip’s transform-attributes. It might easily be about here where I start doing things the wrong way.
  • I then multiply the matrices to obtain a final composed matrix. That composed matrix gets multiplied by each of the vectors from above. After shortening all vectors to 2D (X, Y), instead of (X, Y, W), I hand them over to the draw_callback().
  • It is inside a custom glsl-shader, where the NDCs get remapped to the camera frame eventually.


Here’s the code. The important bits are the function update_shot_verts(), and, to a lesser extent, the glsl-shader code (nothing special there, though). The rest is pure periphery (mainly a modal operator and a draw callback used by it).

import bpy, gpu, math
import mathutils as mu

from gpu_extras.batch import batch_for_shader
from bpy_extras import view3d_utils as extras

# glsl shader code (uniform color)
vert_shader_uni = """
in vec2 pos;
in vec2 offset;
in vec2 offset_max;
//in vec2 cam_frame;
flat out vec4 finalColor;

uniform vec4 color;
uniform mat4 ModelViewProjectionMatrix;

// standard maprange implementation
float maprange(in float val,
                in float min_in,
                in float max_in,
                in float min_out,
                in float max_out)
{
    float result =
        min_out + (max_out - min_out) * ((val - min_in) / (max_in - min_in));
    
    return result;
}

void main()
{
    //vec4 rel_pos = vec4((((cam_frame.xy - offset.xy) * pos.xy) + offset.xy),
    //                    0.0, 1.0);
    //vec4 rel_pos = vec4((((cam_frame.xy - offset.xy) + ((cam_frame.xy -
    //                    offset.xy) * pos.xy)) + offset.xy), 0.0, 1.0);
    
    //vec4 rel_pos = vec4((cam_frame.xy - offset.xy), 0.0, 1.0);
    //rel_pos.x = maprange(pos.x, 0.0, 1.0, 0.0, rel_pos.x);
    //rel_pos.y = maprange(pos.y, 0.0, 1.0, 0.0, rel_pos.y);
    
    vec4 rel_pos = vec4(0.0, 0.0, 0.0, 1.0);
    rel_pos.x += maprange(pos.x, 0.0, 1.0, offset.x, offset_max.x);
    rel_pos.y += maprange(pos.y, 0.0, 1.0, offset.y, offset_max.y);
    
    //rel_pos += vec4((offset.xy * 2.0), 0.0, 1.0);
    
    gl_Position = ModelViewProjectionMatrix * rel_pos;


    finalColor = color;
}
"""

frag_shader_uni = """
flat in vec4 finalColor;
out vec4 fragColor;

uniform float ui_alpha = 0.5;

void main()
{
  fragColor = mix(vec4(0.0), finalColor, ui_alpha);
}
"""


def find_valid_view3d(context):
    '''Finds all 3D-view areas which are in camera-view and returns them as a
        set. Currently assumes only a single window is open.
        May return an empty set.'''
    
    screen = context.screen
    reg_view3D = set()
    
    # get all 3D-View areas in context.screen
    reg_view3D.update([area for area in screen.areas if area.type == 'VIEW_3D'])
    # list of all 3D-View areas not in camera-view (note area.spaces[0] is
    # always the currently active space of the area).
    invalid = [area for area in reg_view3D  if
                area.spaces[0].region_3d.view_perspective != 'CAMERA']
                
    # using set.difference here returns an empty set, if both sets share all
    # elements, other than e.g. set.discard(other_set), which would return None.
        
    return reg_view3D.difference(set(invalid))


def view3d_camera_frame(context, cameraobj, region=None, region_view3d=None):
    '''Get region-relative 2D-coordinates of a camera's frame in pixel values.
        More precisely, the 2D-coordinates of each vertex of the camera's
        rectangular frame (as seen in the viewport in camera-view).
        Return value is a list of 2D mathutils.Vector, in the following order:
        [lower_left, lower_right, upper_right, upper_left].'''

    camera = cameraobj.data
    scene = context.scene
    
    # check for optional parameters
    if region == None:
        region = context.region
        
    if region_view3d == None:
        region_view3d = context.space_data.region_3d

    frame = camera.view_frame(scene=scene)
    # reorder elements to counter clockwise order starting from lower left corner
    frame = [frame[i] for i in (2, 1, 0, 3)]
    matrix = cameraobj.matrix_world.normalized()
    # Transform from object-space into world-space 
    frame = [matrix @ vector for vector in frame]

    # Transform into pixelspace
    # Note we assume context.region.type == 'WINDOW' here (is usually the case).
    frame_px = [extras.location_3d_to_region_2d(region, region_view3d, v) for
                v in frame]
    
    return frame_px


def update_shot_verts(context):
    '''Extracts camera-relative 2D coordinates in normalized 0-1 range
        where (0, 0) represents the lower left corner from sequencer strip.'''
    
    # Corner vectors in normalized 0-1 range (NDC). Note we assume counter
    # clockwise order, starting at lower left corner, as elsewhere in this
    # whole package.
    first = mu.Vector((0.0, 0.0, 1.0))
    second = mu.Vector((1.0, 0.0, 1.0))
    third = mu.Vector((1.0, 1.0, 1.0))
    fourth = mu.Vector((0.0, 1.0, 1.0))
    vert_pos = [first, second, third, fourth]
    # get strip data. Note we assume an open sequencer window and an active strip
    # here for code brevity in this simplified test scenario.
    strip = context.active_sequence_strip
    trans = strip.transform
    # get output resolution
    res_x = context.scene.render.resolution_x
    res_y = context.scene.render.resolution_y
    # get normalized offsets
    shift_x = trans.offset_x / res_x
    shift_y = trans.offset_y / res_y
    # 
    # translation matrix to re-center at lower left corner before rot/scale
    matrix_shift_pre = mu.Matrix.Identity(3)
    matrix_shift_pre.col[2] = mu.Vector([-trans.origin[0],
                                        -trans.origin[1],
                                        1.0,])
    # translation matrix to re-position vectors after the rotation/scaling
    matrix_shift_post = matrix_shift_pre.inverted()
    #
    # prepare rotation matrix
    matrix_rotate = mu.Matrix.Rotation(trans.rotation, 3, 'Z')
    # prepare scaling matrix
    matrix_scale = mu.Matrix.Diagonal(mu.Vector([trans.scale_x,
                                                trans.scale_y,
                                                1.0]))
    # prepare translation matrix
    matrix_trans = mu.Matrix.Identity(3)
    matrix_trans.col[2] = mu.Vector((shift_x,
                                    shift_y,
                                    1.0,))
    # prepare identity matrices in case trans.use_flip_x or trans.use_flip_y
    # are False respectively
    matrix_mirror_x = mu.Matrix.Identity(3)
    matrix_mirror_y = mu.Matrix.Identity(3)
    # prepare dedicated X-mirror matrix, if needed
    if strip.use_flip_x:
        matrix_mirror_x = mu.Matrix.Diagonal(mu.Vector([1.0, -1.0, 1.0]))
    # prepare dedicated Y-mirror matrix, if needed
    if strip.use_flip_y:
        matrix_mirror_y = mu.Matrix.Diagonal(mu.Vector([-1.0, 1.0, 1.0]))
        
    print('shift_pre: \n {a}'.format(a=matrix_shift_pre))
    print('shift_post: \n {a}'.format(a=matrix_shift_post))
    print('mirr_x: \n {a}'.format(a=matrix_mirror_x))
    print('mirr_y: \n {a}'.format(a=matrix_mirror_y))
    print('trans: \n {a}'.format(a=matrix_trans))
    print('rot: \n {a}'.format(a=matrix_rotate))
    print('scale: \n {a}'.format(a=matrix_scale))
    
    # prepare final matrices
    matrix_rot_final = matrix_shift_pre @ matrix_rotate @ matrix_shift_post
    # 
    matrix_mirror_final = matrix_mirror_x @ matrix_mirror_y
    # 
    matrix_scale = matrix_scale @ matrix_mirror_final
    matrix_scale_final = matrix_shift_pre @ matrix_scale @ matrix_shift_post 
    # 
    matrix_final = matrix_scale_final @ matrix_rot_final @ matrix_trans

    # multiply matrices by vectors
    vert_pos = [matrix_final @ vector for vector in vert_pos]
    
    # convert length 3 Vectors(X, Y, W) used to ensure homogeneous
    # coordinates (needed for matrix operations) to length 2 Vectors(X, Y),
    # which our shader expects.
    for i in range(4):
        vert_pos[i].resize_2d()
    # convert 2D-Vectors to nested tuples and return them
    print(tuple([tuple(pos) for pos in vert_pos]))
    return tuple([tuple(pos) for pos in vert_pos])


def draw_callback():
    '''Draw callback.'''
    
    context = bpy.context
    # assume scene.camera != None for code brevity in this test-scenario
    cam = context.scene.camera
    color_opacity = 0.8
    # update vertex vectors
    verts = update_shot_verts(context)
    
    # get set of all 3D-view areas being in camera-view
    areas_view3D = find_valid_view3d(context)
    # assume no empty set for code brevity in this test-scenario
    region = [region for region in areas_view3D.pop().regions if
                region.type == 'WINDOW'][0]
    region_view3d = region.data
                
    # 
    frame = view3d_camera_frame(context, cam, region, region_view3d)
    # chain vectors x4 to conform VBO-length
    camera_min = tuple([frame[0] for i in range(4)])
    camera_max = tuple([frame[2] for i in range(4)])
    
    shader = shader = gpu.types.GPUShader(vert_shader_uni, frag_shader_uni)
    batch = batch_for_shader(shader, 'TRI_FAN', {'pos':verts,
                                                'offset':camera_min,
                                                'offset_max':camera_max})
    shader.uniform_float('color', (0.8, 0.3, 0.0, 1.0))
    shader.uniform_float('ui_alpha', color_opacity)

    # render to viewport
    batch.draw(shader)
    
    
class Something_OT_something( bpy.types.Operator ):
    bl_idname = 'camera.frame_test'
    bl_label = 'Camera Frame Test'
    
    def __init__(self):
        self.handle = None

    def execute( self, context ):
        self.report({'WARNING'}, 'Operator has no execution. Use as modal.')
        return{'CANCELLED'}
    
    def invoke( self, context, event ):
        self.report( {'INFO'}, 'Start realtime update.' )
        # Prepare tuple of arguments for draw_callback
        args = ()
        # Add draw handler
        self.handle = bpy.types.SpaceView3D.draw_handler_add(draw_callback,
                                                            args,
                                                            'WINDOW', 
                                                            'POST_PIXEL')
        context.window_manager.modal_handler_add(self)
        # Force redraw on all 3D-view areas
        self.force_redraw(context)
        return{'RUNNING_MODAL'}
    
    def modal(self, context, event ):
        if event.type == 'ESC':
            return self.finish(context)
        return{'PASS_THROUGH'}
    
    def finish( self, context ):
        # Remove draw handler
        bpy.types.SpaceView3D.draw_handler_remove( self.handle, 'WINDOW' )
        # Force redraw on all 3D-view areas
        self.force_redraw(context)
        self.report( {'INFO'}, 'Stopped realtime update.' )
        return{'FINISHED'}
    
    def force_redraw(self, context):
        # Force redraw on all 3D-view areas
        for area in context.screen.areas:
                if area.type == 'VIEW_3D':
                    area.tag_redraw()

bpy.utils.register_class(Something_OT_something)

.
Known limitations

I do realize one thing (but it is not my main concern and overshadowed by what else goes wrong):
In most cases where the transform.rotation attribute is !=0.0, one or several of the rectangle’s corners will naturally be located outside the boundaries of the camera frame (unless counteracted via transform.scale_x/y).
This of course invalidates the assumption all the coordinates which go into the vertex-shader as vec2 pos were NDCs in the strict sense and somewhere within the range 0.0-1.0.
I am well aware the vertex-shader itself, as currently implemented, relies on the above assumption and cannot be expected to get the mapping correct with that assumption violated by the input.

With that said, fixing this would not only need a deeper refactor of the glsl-code (including a more complex fragment-shader which would clip things outside the frame, I suppose), but is of limited relevance to what I’m trying to do here.
In other words, my interest in making particularly the rotation work here, is severly limited from the getgo and I might just decide to eventually silently ignore transform.rotation entirely, eventually.

.
What I need help with

Well, as much as I tried to read up about transformation matrices and how to use them, I can’t quite seem to make it work properly and seem to only continuously confuse myself by trying to.
I guess, at its core, I mainly struggle to understand

  • in what order do I need to multiply the matrices fo obtain a single final transformation matrix
  • how do I factor in the values from transform.origin properly (strangely enough, they seem to even affect what transform.offset_x/y does, like some weird extra-offset, which doesn’t seem to make all that much sense to me, conceptually).
  • how do I factor in transform.use_flip_x/y. I know a 2D mirroring matrix should be
X-mirror:           Y-mirror:
|-1.0  0.0  0.0|    | 1.0  0.0  0.0|
| 0.0  1.0  0.0|    | 0.0 -1.0  0.0|
| 0.0  0.0  1.0|    | 0.0  0.0  1.0|

…but firstly using the one for X-mirroring seems to result in mirroring on the Y-axis instead, and vice versa (god knows why), and secondly I do not quite know how to combine such matrices with whichever other transforms I (pothentially) need.

  • should I multiply the x-mirror matrix by the y-mirror matrix in case transform.use_flip_x == transform.use_flip_y == True?
  • should I multiply the mirror-matrix with the scale-matrix, for the mirror-matrix is technically and syntactically really a specific scale-matrix itself?

.
Thanks

For any help or clarifications you might be able to provide. I know I’m probably overlooking something obvious here, as usual. As always, feel free to ask, if my longwinded explanations were too confusing.

greetings, Kologe

final  = mat_origin.copy()
final @= mat_mirror
final @= mat_loc
final @= mat_rot
final @= mat_scale
final @= mat_origin.inverted()

You pretty much factored in origin correctly. You start the matrix with the origin, then multiply the components as above, then finalize with the inverted origin.
The origin controls the pivot for matrix rotation and scale.
Matrices inherently rotate and scale with the pivot in the lower left corner (origin = (0, 0)).
With origin at 0.5, the strip’s center becomes the pivot.

Mirror happens right before L/R/S is multiplied. Your mirror matrix is correct.

Edit:
Just tested your script and realized I missed mirror adjustment and aspect.
These need to be baked into the pre origin and the scale’s x respectively.

Here’s a modified version of your function.

def update_shot_verts(context):
    '''Extracts camera-relative 2D coordinates in normalized 0-1 range
        where (0, 0) represents the lower left corner from sequencer strip.'''
    
    # Corner vectors in normalized 0-1 range (NDC). Note we assume counter
    # clockwise order, starting at lower left corner, as elsewhere in this
    # whole package.
    first = mu.Vector((0.0, 0.0, 1.0))
    second = mu.Vector((1.0, 0.0, 1.0))
    third = mu.Vector((1.0, 1.0, 1.0))
    fourth = mu.Vector((0.0, 1.0, 1.0))
    vert_pos = [first, second, third, fourth]
    # get strip data. Note we assume an open sequencer window and an active strip
    # here for code brevity in this simplified test scenario.
    strip = context.active_sequence_strip
    trans = strip.transform
    # get output resolution
    res_x = context.scene.render.resolution_x
    res_y = context.scene.render.resolution_y

    mat_mirror = mu.Matrix.Diagonal((1.0, 1.0, 1.0, 1.0))
    mat_loc = mu.Matrix.Translation((trans.offset_x / res_x, trans.offset_y / res_y, 0.0))

    aspect = res_x / res_y
    mat_scale = mu.Matrix.Diagonal((trans.scale_x * aspect, trans.scale_y, 1.0, 1.0))
    mat_aspect = mu.Matrix.Diagonal((1.0 / aspect, 1.0, 1.0, 1.0))
    mat_rot = mu.Matrix.Rotation(trans.rotation, 4, 'Z')

    mat_origin = mu.Matrix.Translation((*trans.origin, 0.0))
    mat_final = mat_origin.copy()

    # Mirror the start origin.
    if strip.use_flip_x:
        mat_mirror[0][0] = -1
        mat_final[0][3] = 1.0 - mat_final[0][3]

    if strip.use_flip_y:
        mat_mirror[1][1] = -1
        mat_final[1][3] = 1.0 - mat_final[1][3]

    mat_final @= mat_mirror
    mat_final @= mat_loc
    mat_final @= mat_aspect
    mat_final @= mat_rot
    mat_final @= mat_scale
    mat_final @= mat_origin.inverted()

    # multiply matrices by vectors
    vert_pos = [mat_final @ vector for vector in vert_pos]
    
    # convert length 3 Vectors(X, Y, W) used to ensure homogeneous
    # coordinates (needed for matrix operations) to length 2 Vectors(X, Y),
    # which our shader expects.
    for i in range(4):
        vert_pos[i].resize_2d()
    # convert 2D-Vectors to nested tuples and return them
    print(tuple([tuple(pos) for pos in vert_pos]))
    return tuple([tuple(pos) for pos in vert_pos])

For clipping fragments outside the camera boundary, you just need to pass the camera rect (lower left, upper right) to the fragment shader and discard frag coords that go outside it.

Thanks again, you’re most helpful.
Much more so than the .pdf-versions of lecture-slides from like five different universities, which are basically all the same and tell me about all the general concepts, but e.g. all fail to tell me this:

…although they do of course tell about composing L/R/S-matrices and do not fail to mention mirroring itself. :neutral_face:

I would have probably never gotten that right (especially the aspect ratio I’d overlooked for all eternity).


I suppose the reason you’re using mostly 4x4 matrices is mathutils.Matrix.Translation() silently assumes we’re working in 3D and returns a 4x4 matrix and doesn’t have a size parameter?
I wonder weather there’s a technical reason for that or is it just an implementation detail (in the api, I mean)?

Ah yes, I remember, there’s that discard statement in glsl. That makes sense.

greetings, Kologe

1 Like

I just find them easier to work with. For 3d rotations we only need 3x3, for 2d rotation just 3x2, but rotating with an embedded translation we need 4x4 anyway.

Mirroring can be put in any order (up to scale) for a variety of effects.

If you activate X mirror and try to translate horizontally, you’ll notice the direction will be flipped, eg. adding +px causes the strip to move to the left instead of right. That’s how you can tell the mirror is computed before location.

When you wrote 3x2, was that a typo?
From my understanding a 3x2 rotation matrix would contradict the idea of working with homogeneous coordinates, wouldn’t it? You couldn’t really use matrix multiplication to compose a final transformation matrix.

Oh yes, I see.

In Blender we would need a 3x3 for 2d transformation only because there isn’t support non-square matrix transform. For linear 2d transforms that doesn’t explicitly require a Z component, the Z row isn’t used and could technically be omitted. If the end result is 2d and you need a 3d transform, you would need 3x3.

I guess what I’m saying is, no it’s not a typo, yes we can’t use that in Blender. The 3x2 is a theoretical requirement, but it’s definitely not practical.

Edit typo: I meant Z component, not W which is perspective division :slight_smile: