I’m trying to derive the 3d pose of a detected april tag. In my opinion, there are some issues with the documentation - and possibly with the code as well. But most importantly I’m missing a explicit definition of the april tag’s coordinate frame. So one after the other:

What is the coordinate frame of an april tag? Where is its origin and where are its x-, y- and z-axis pointing at? (This important since it defines its rotation values, e.g. is x-/y-rotation 0 or 180 degrees when looking straight at it.)

I think the documentation describing [xyz]_translation and [xyz]_rotation is incorrect:

apriltag.y_translation(): “Note that this is the up-to-down direction.” - I think Y is pointing up, since it is increasing when the tag is moving upwards within the camera image.

apriltag.x_rotation(): “Returns the rotation in radians of the apriltag in the X plane. E.g. moving the camera left-to-right while looking at the tag.” - I think it’s the rotation around the X axis. (What is the X plane anyway?) Also: Whether it increases or decreases depends on whether z_rotation is e.g. 0 or 180 degrees.

apriltag.y_rotation(): Like with x_rotation axis and direction are incorrectly described.

I converted tag translations and rotations to 4x4 motion matrices, did some computations and transformations and printed resulting translations and rotations: The results were pretty strange until I found this line in your code: https://github.com/openmv/openmv/blob/master/src/omv/img/apriltag.c#L11946
Instead I’d compute the yaw as

(no minus sign). When replacing z_rotation with -z_rotation in my code, my results are as expected. I think you shouldn’t confuse the tag’s rotation within the image (which is a left-handed coordinate frame) and around the world’s z-axis (which is a right-handed system). I assume dropping the minus sign in that line of code yields a consistent rotation as it is usually used in computer vision applications: increasing angles when rotating from positive x to positive y, positive y to positive z and positive z to positive x.

I hope my post is not too confusing and it is of some value for you and the project.
Over all I’m totally amazed by OpenMV, its community and your awesome support!

Hi, thanks for this post. It’s going to take me some time to digest. That said, I’m not perfect on all this stuff. I only know the AprilTag part. Not the possible part.

If you’d like more possible information and such please let me know what you need/want and send me a PR and we can fix it for you.

Note though, one of the goals with the OpenMV Cam is to be easier than OpenCV so we try to do as much work as possible for our users. So, instead of returning matrices I’d prefer to just return the result the user wants/needs.

So, can you describe a little bit more what would be an ideal method for you? Like, we can add a method call to the AprilTag object that returns something. I’m just not sure what that something is.

Um, about the PR stuff. Seriously, we could use the help, if you want something fixed be it code or documentation send a PR. If you need help setting up how to compile the code let me know.

Hey, thanks for the quick response! I see your point. Trying to provide a simpler API than OpenCV is a good goal.

Besides that, I thought about the coordinate frame once more and it starts to make sense:
I guess the rotation angles relate to the coordinate axes of the april tag, not to the axes of the camera. So the documentation would need to explain the marker movement due to small x_rotation, y_rotation or z_rotation relative to an initial position, which is usually parallel to the camera (and - I guess - facing away from the camera). It becomes clear when thinking about where the tag’s coordinate axes point at. Now that I know where to look at, I can fiddle with it once again and probably find it out tomorrow. And knowing both coordinate frames explains whether to negate the z_rotation (my third point).

So I’ll try to find a definition for the marker translation and rotation tomorrow. Then the code doesn’t need to get changed.

Here is my attempt to clarify the translations and rotations. First we need to define the axes of camera and AprilTag. The translation and rotation between the two frames usually follows naturally.

Coordinate frame of OpenMV Camera:

Origin: projection center of the camera
X-axis: pointing to the right (with respect to camera image)
Y-axis: pointing up (with respect to camera image)
Z-axis: pointing towards the camera (Z coordinate corresponds to negative distance, completing a right-handed coordinate frame)

Coordinate frame of an AprilTag:

Origin: marker center
X-axis: pointing to the right
Y-axis: pointing down
Z-axis: pointing into the marker (completing a right-handed coordinate frame)

The directions refer to an “upright” AprilTag as shown here: AprilTag.

Translation and rotation of a detected AprilTag:

x_translation: translation from left to right with respect to the camera projection center as seen by the camera
y_translation: translation from bottom to top with respect to the camera projection center as seen by the camera
z_translation: negative z-distance between AprilTag and camera projection center, increases towards 0 when getting closer
x_rotation: rotation around the AprilTag’s left-to-right x-axis, increases when lifting its upper half
y_rotation: rotation around the AprilTag’s top-to-bottom y-axis, increases when lifting its left half
z_rotation: rotation around the AprilTag’s z-axis, increases when turning clock-wise

The z_rotation should turn the other way around to follow the conventions of a right-handed coordinate frame (e.g. Left- vs. Right-Handed Coordinate Systems).
Note that x_rotation is usually around 180 degrees since the camera’s and AprilTag’s z-axes are inverse to each other. This flip is apparent in x_rotation.
The flip around x yields y_rotations and z_rotations that seem to violate above-mentioned conventions of a right-handed coordinate frame, but instead are results of the concatenation of three rotations.

Currently the concatenated rotation can be expressed as:
R = Rz(-z_rotation) * Ry(y_rotation) * Rx(x_rotation)
Again, the sign of z_rotation should be changed. Then the angles represent the very common Aircraft principal axes (Aircraft principal axes - Wikipedia) with yaw, pitch and roll.

I’d suggest to keep the current behavior of “rotation” (clock-wise rotation in the image). So it would have a different sign than “z_rotation” (right-handed rotation around the z-axis). But I’m not sure where to implement this change. 12k lines in apriltag.c and 5k in py_image.c is quite some code to look into…

I just noticed that the translation doesn’t respect a given region of interest (roi). So if a fixed april tag is detected within different rois, the translation is different.

I mention it here just for the record. A proper fix is probably a bit more complicated.

Edit:
A workaround is to use the cx and cy arguments to fake a virtual principal point for the camera relative to the region of interest.

Thanks for finding these bugs! I will look into this when I have time. Please send another PR to adjust cx and cy by the ROI center. I believe this is the correct thing to do since I don’t share any ROI info with the AprilTag code so it would need to know cx and cy have moved. It assumes it’s looking at the center of the field of view.