Detecting position and rotation of the openMV camera relative to a computer screen

Hi all, I am trying to use my openMV camera to find out where my computer screen is relative to the openMV camera for an eye tracking experiment. The setup I currently have is that an openMV camera mounted on a 3D printed eyewear facing forwards towards the computer screen, and another facing the user’s eyes for infrared eye tracking (not the focus of this topic).

I had originally tried using blob tracking to determine the corners/edges of the screen (the screen has just been set to pure white for now to provide a strong contrast against the background), however the min_corners did not accurately give me the 4 corners, so I’m now using infinite line detection and extracting the intersection points with math to detect the corners of the screen (sometimes its off by a few pixels, but this is the most accurate solution I’ve got for now, though I’m open to more suggestions). I’ve also experimented with april tag’s camera pose estimation, which looked promising at first, however given my current setup, the april tags are too small for them to be detected at such a distance (roughly more than 50cm away from the screen) and low resolution (limited by memory buffer) and it would be impractical for me to print on massive april tags to place around the screen.

Anyways, back to the problem: given that I have 4 corners of the computer screen in pixel coordinates, and I can find the focal length and field of view of the lens from the store page, how do I find the position and rotation of the camera relative to the screen, given that I know the size of the screen in real life? It all seems to make sense that all these factors are related but I just can’t seem to piece it together. Are there any built in functions for this? Ideally, once I can get the position and rotation of the camera relative to the screen, I hope to be able to combine this with my other openMV camera facing the eye, and determine at which point of the screen is the user looking at. Any help on this matter is greatly appreciated.

Hi, yeah, this is generally solved by mathematics.

Actually, I’d just ask ChatGPT for help on this. Given we have the ulab library installed more or less any numpy math code it gives you should be able to be run on the camera and is basically what you want.

It looks like you really want the solvePnP method from OpenCV which we don’t have onboard… but, after prompting ChatGPT it gave me something like this:

import ulab as np

# The four corners of the screen in pixel coordinates
image_points = np.array([
    [x1, y1, 1],  # top-left corner
    [x2, y2, 1],  # top-right corner
    [x3, y3, 1],  # bottom-right corner
    [x4, y4, 1]   # bottom-left corner
])

# The size of the screen in real life (in meters)
screen_width = 0.6
screen_height = 0.35

# The four corners of the screen in real world coordinates
object_points = np.array([
    [0, 0, 1],                 # top-left corner
    [screen_width, 0, 1],      # top-right corner
    [screen_width, screen_height, 1],  # bottom-right corner
    [0, screen_height, 1]     # bottom-left corner
])

# Compute the homography H using the DLT algorithm
A = []
for i in range(4):
    X = object_points[i]
    x = image_points[i]
    A.append([-X[0], -X[1], -1, 0, 0, 0, x[0]*X[0], x[0]*X[1], x[0]])
    A.append([0, 0, 0, -X[0], -X[1], -1, x[1]*X[0], x[1]*X[1], x[1]])
A = np.array(A)
_, _, v = np.linalg.svd(A)
H = v[-1].reshape((3, 3))

# Normalize H so that H[2, 2] = 1
H = H / H[2, 2]

# Compute the camera pose from the homography
K = ...  # The intrinsic camera matrix
inv_K = np.linalg.inv(K)
h1 = H[:, 0]
h2 = H[:, 1]
h3 = H[:, 2]

lambda_ = 1 / np.linalg.norm(np.dot(inv_K, h1))
r1 = lambda_ * np.dot(inv_K, h1)
r2 = lambda_ * np.dot(inv_K, h2)
r3 = np.cross(r1, r2)
t = lambda_ * np.dot(inv_K, h3)

R = np.column_stack([r1, r2, r3])

Anyway, I suggest using it to help on these problems. It’s super great to get something kinds starting to work.

That said, for the above… we just had a question about ulab missing an svd method. I might need to bug the creator about adding that.

This is what is spit out originally:

import numpy as np
import cv2

# The four corners of the screen in pixel coordinates
image_points = np.array([
    (x1, y1),  # top-left corner
    (x2, y2),  # top-right corner
    (x3, y3),  # bottom-right corner
    (x4, y4)   # bottom-left corner
], dtype="double")

# The size of the screen in real life (in meters)
screen_width = 0.6
screen_height = 0.35

# The four corners of the screen in real world coordinates
object_points = np.array([
    (0, 0, 0),                 # top-left corner
    (screen_width, 0, 0),      # top-right corner
    (screen_width, screen_height, 0),  # bottom-right corner
    (0, screen_height, 0)     # bottom-left corner
], dtype="double")

# The camera matrix (focal lengths and optical centers)
focal_length = size[1]
center = (size[1]/2, size[0]/2)
camera_matrix = np.array([
    [focal_length, 0, center[0]],
    [0, focal_length, center[1]],
    [0, 0, 1]
], dtype="double")

# Assume no lens distortion
dist_coeffs = np.zeros((4,1))

# Use solvePnP to find the pose
(success, rotation_vector, translation_vector) = cv2.solvePnP(object_points, image_points, camera_matrix, dist_coeffs)

# The rotation and translation vectors contain the position and rotation of the camera relative to the screen

and then it gave me this after telling it there’s no solvePnP.

import numpy as np

# The four corners of the screen in pixel coordinates
image_points = np.array([
    [x1, y1],  # top-left corner
    [x2, y2],  # top-right corner
    [x3, y3],  # bottom-right corner
    [x4, y4]   # bottom-left corner
])

# The size of the screen in real life (in meters)
screen_width = 0.6
screen_height = 0.35

# The four corners of the screen in real world coordinates
object_points = np.array([
    [0, 0],                 # top-left corner
    [screen_width, 0],      # top-right corner
    [screen_width, screen_height],  # bottom-right corner
    [0, screen_height]     # bottom-left corner
])

# Compute the homography H
H, _ = cv2.findHomography(object_points, image_points)

# Compute the camera pose from the homography
K = ...  # The intrinsic camera matrix
H1 = H[:, 0]
H2 = H[:, 1]
H3 = np.cross(H1, H2)

norm1 = np.linalg.norm(H1)
norm2 = np.linalg.norm(H2)
tnorm = (norm1 + norm2) / 2.0

T = H[:, 2] / tnorm
inv_K = np.linalg.inv(K)
Btilde = np.column_stack([H1, H2, H3])

B = inv_K.dot(Btilde)
B = B / B[2, 2]
d1 = B[:, 0]
d2 = B[:, 1]
d3 = B[:, 2]

np0 = np.dot(inv_K, T)
if np.linalg.det(B) < 0:
    d1 = -d1
    d2 = -d2
    d3 = -d3
    np0 = -np0

R = np.column_stack([d1, d2, d3])
t = np0

And then it gave the above after mentioning there’s no cv2 method.

Hi, thanks for your incredibly speedy reply. I doubt openMV has a findHomography() function either, although I could be wrong. I’ll try looking into the ulab library and the method you’ve proposed from chatGPT above and I’ll be back if I have more questions, thanks!

Yeah, we actually have this feature inside of the apriltag library. It’s what spits out the tag pose. However, it’s just for AprilTag stuff. I guess I can expose this stuff.

Let me know if you can’t get it working with just ulab as it is and ChatGPT.

Is it possible to have a look at the source code from the aprilTag stuff so I might take a peek at how its being implemented?

Sure:

matd_t *pose = homography_to_pose(det->H, -fx, fy, cx, cy);

        lnk_data.x_translation = MATD_EL(pose, 0, 3);
        lnk_data.y_translation = MATD_EL(pose, 1, 3);
        lnk_data.z_translation = MATD_EL(pose, 2, 3);
        lnk_data.x_rotation = fast_atan2f(MATD_EL(pose, 2, 1), MATD_EL(pose, 2, 2));
        lnk_data.y_rotation = fast_atan2f(-MATD_EL(pose, 2, 0), fast_sqrtf(sq(MATD_EL(pose, 2, 1)) + sq(MATD_EL(pose, 2, 2))));
        lnk_data.z_rotation = fast_atan2f(MATD_EL(pose, 1, 0), MATD_EL(pose, 0, 0));

That’s how we figure out your translation and rotation. MATD_EL is just grabbing a row/column from a matrix.

-fx, fy, cx, cy are the camera focal length and center of the image in floating point. fx/fy are usually equal and cx/cy are just the center point of the image.

And here’s how a homography is done: openmv/src/omv/imlib/apriltag.c at master · openmv/openmv (github.com).

Here’s the function: openmv/src/omv/imlib/apriltag.c at master · openmv/openmv (github.com)

This kinda math is definitely graduate-level. It may take you a while to wrap your head around it.

i was wondering why the blob rotation doesnt meet your needs…
With blob you can get wide, height, center x,y and rotation…

I’m actually a third year Electrical and Electronics student, so I hope I should be able to make sense of this… Does the code you provided require importing any python modules? Also, is there a built in method for performing SVD (singular value decomposition)?

Because I’m not just tracking the rotation of a rectangle along the axis of the rectangle to the camera (z axis), but I need the translation vector and rotation matrix from the camera to this plane. Blob tracking does not provide me enough information for this. Even if I were to use it for detecting the corners of the rectangle, the min_corners sometimes does not detect the corners properly (as mentioned in the post). I would upload an image to show what I mean, but apparently new users are not allowed to upload images.

1 Like

You should be able to upload images now.

As for SVD support. I’ve brought this to the attention of the ulab developer. I don’t have an ETA on when this can be implemented though.

The functionality is there though inside of the AprilTag code in the C firmware that runs on the camera. If you are comfortable editing the firmware and reflashing it you can easily export this functionality for whatever you want.

See the rotation correction functions C code. It executes a homography for one of the modes where it remaps an image using point corners.

Thanks for allowing me to post pictures now. Editing the firmware seems like a very time consuming thing to do given that my project has a deadline… Perhaps it might be easier to implement everything from scratch myself, I’m just not too sure how the performance would be especially given that it is an embedded platform. For now I think my course of action is to read up more on homographies, PnP and ulab and try to implement everything from scratch.

Line tracking.bmp (225.1 KB)
Blob tracking.bmp (225.1 KB)
Here’s what blob tracking does as compared to line tracking, blob tracking does not accurately retrieve the 4 corners using min_corners.

First of all in the photo it seems that the blob isn’t only the screen.
Blob rotation will work like a charm on this.
You can do a blob detection to grayscale for best results.
Despite that you can even get the angle of the horizontal lines directly.

rotation()

Returns the rotation of the blob in radians (float). If the blob is like a pencil or pen this value will be unique for 0-180 degrees. If the blob is round this value is not useful.

You may also get this value doing [7] on the object.

rotation_deg()

Returns the rotation of the blob in degrees.

Like I mentioned earlier, I won’t only be rotating the camera around the axis from the camera to the screen. The camera can be shifted off to the side, top, bottom, it can be rotated such that the screen looks sheared, it can be rotated in all sorts of ways as long as the camera can see the screen. I will then need the translation vector of the camera relative to the screen, as well as the rotation matrix of the camera relative to the screen. Blob rotation only returns one axis of rotation which is not useful to me as I need the orientation of the camera. Getting the rotation of each line also does not help me achieve my goal of finding the position and rotation relative to the screen. Even then, I still need the translation vector, which blob tracking does not provide.

1 Like

Hi, the ulab folks are working on integrating svd support now. I pointed him to where the svd algorithm is and the homography code which he may also add. He said it would get done over the weekend.

That said, whenever things like this occur where we don’t have the features you need… we will try to get them mainline. However, from your timeline, you’ll need to figure out a way to get this done sonner.

As for an embedded platform and lack of performance. The CPU is as fast as a RaspberryPi 2. It can do the math.

ok i got it now.
But i dont know how this will help you with the project since if camera is tilted on any other axis the screen will not be in your field of view…
All eye tracking projects i saw was just looking the eyes to move a cursor on a screen.
nevermind good luck with your project friend…

That would be great actually, if he could get it done by the weekend he’d probably be way faster than me trying to learn this from scratch. Would this mean that by the time the ulab team is done, I would be able to implement exactly what I want (something similar to the cv2.solvePnP() function from opencv)? If not, if they would be able to provide the documentation for their newly implemented functions, I’ll try and see if I can get it implemented myself (although I’ve been reading up on PnP for days now and still don’t quite get it). I’m thinking of deconstructing this problem into a simple 3D vector problem (from object to pinhole to image plane) and work it out from there, but I’m not sure if I can make the assumption that this is a linear camera model or if I can treat it as a pinhole camera model, and I’d need to know the sensor size in real life to figure out how much space a pixel takes up in the sensor, whether windowing or resolution affects how much of the sensor is actually used (hence affecting the image plane size). I’m not sure if I’m severely overthinking this but this has got me scratching my head quite abit.

Actually I think performance isn’t as big of a concern as memory, I’ve ran into many out of buffer memory errors on the openMV and I’m worried given the amount of matrix math I’m not sure if it can work well.

Hi, I intend to have the camera able to be tilted around any axis but still keep its focus on the monitor, similar to how if you had a drone and set it to track an object while it moved around (best example I can think of now). I intend to create the same thing you mentioned using the position of the eye’s gaze vector to move the mouse to where the user is looking on the screen, however to do this I must know where the camera is relative to the screen and that’s what I’m stuck on now. Thanks for all your help though, appreciate it