Recorded stream annotation


I’m recording raw OpenMV H7 streams to use for developing a machine vision algorithm. When initially prototyping the algorithm it was fine to run in on the recorded streams, draw some relevant lines, and manually validate how the code was doing. Now that we’re getting more and more recorded streams however, this becomes time consuming and hard. I would like to instead hand-annotate the recordings with the correct output somehow, to create a dataset with known output. This way I could evaluate my algorithm using regular supervised methods, and numerically describe how well it was performing across all datasets.

Do you have any suggestions on how to achieve this? I’m thinking some tool which would display recorded frames and let me draw the correct output every now and then (and interpolate between these keyframes) would make this process sort of doable. Should I mod the IDE? Create my own tool reading raw streams?


Hi, what type of algorithms are you doing? Have you checked out Edge Impulse CNNs for object classification? They will have bounding box support soon.

I work for, and am implementing a system for real-time tracking of a rail while driving. I didn’t think CNN was the right choice because I’m under the impression it’s not fast enough (we need about 30-40fps). Essentially we’re detecting a conceptual line, and would like to compare the output (slope/offset) to a human-defined truth.

It’s also interesting to note that any frame is highly dependent on the previous one, which can be leveraged to improve robustness.

Ah, okay, so you are using like find line segments and find lines.

So, what is the exact pain point? The ImageReader and writer allow you to replay your code on what happened.

It’s now called ImageIO… Anyway I think I understand what you want to do, you want to draw the lines between key frames, and have the tool interpolate between key frames, so it’s less manual work. This is just too application specific, the IDE can’t do that for you. You can try to implement it yourself in the IDE is opensource, but it’s probably a lot easier to write something with Open-CV + Python, the raw video format is really easy to parse and you can output the truth table to txt file and read it using the frame number as index.

@kwagyeman I could be using line_segments or find_blobs or whatever. The point is that if I come up with a new way to do it, which I want to try, I don’t want to watch minutes or hours of video to try to see if it’s better or worse than the previous one.

Yes exactly @iabdalkader :). I dunno if it’s application specific really, half of machine learning is based of having supervised data to train on - I just want to supervise. I’m completely fine with you not wanting it in the IDE though. Would you consider adding it if I made a PR?

Otherwise, a pointer to how to parse the raw-format would be helpful, I can code some custom annotator from there.

Yes mostly manually labeled or someone writes specific code to label it, if it’s not too generic like adding a label to an image, which the IDE can do already.

It’s not like we don’t want it in the IDE, was just explaining why it’s not there already. Yes we welcome any contributions.

The format is not documented, it starts with a 16 bytes header which you can skip, followed by frames until the EOF, each frame is:
4 bytes timestamp
4 bytes width
4 bytes height
4 bytes BPP
Image data (whbpp bytes)…

See the code here:

Great, I’ll be back!