Camera pose (position+orientation) from AprilTag

I have an application I feel like should be fairly common: I know the locations of an AprilTag’s corners and that tag is visible and detected in the camera. I would like to know the camera’s full pose (position and orientation). It looks like this is close to easy right now, but not easy enough to justify my actually doing it right now. So, in the mean time, here are my notes for others who might be interested in this capability (and perhaps might be interested in solving it themselves!).

The find_apriltags_3d_pose.py example demonstrates how to get the position and orientation of an observed tag. This is almost already the solution – “all we have to do” is invert this pose to get the camera pose in the tag frame (instead of the tag pose in the camera frame). But, computing the pose from the information available looks to be non-trivial. The pose matrix is known at the C level, but that information is only propagated to the Python level through x_rotation, y_rotation, and z_rotation. Those values are probably more useful than the pose matrix itself in many circumstances, but not this one unfortunately. It sounds like we may possibly get the homography matrix at some point which can be used to compute the pose through an existing algorithm, but I don’t think we have that either yet.

The mavlink_apriltags_landing_target.py example (Examples → Interface Library → MAVLink) shows how the distance to the tag can be scaled to physical reality, but doesn’t deal with positioning or orienting. (Side note: I’m a bit surprised; I would have expected drones to want to control their horizontal plane movement according to a landing target, not just know their distance to the landing target, and that is pretty close to what I want. But I don’t see an example implementing that.)

In the mean time, I’m going to assume the camera is perpendicular to the plane containing the April Tag and only correct for z_rotation.

If anyone solves the camera-pose-from-known-AprilTag problem, I’d be interested in the solution :slight_smile: I’ll also try to post here if I end up solving the general case.

Hi, would it help to just compute this and output it with the tag too? Would be pretty easy to do I think.

Given current work loads. I probably won’t get to fixing high level vision library stuff like AprilTags till next year, but, if you have the c chops… would love to see a PR to add this feature.