Image Segmentation in OpenMV H7 Plus

Hi,

I’m interested in trying out the tf.segment function on the OpenMV H7 Plus camera. Does anyone know of any pre-trained models that work well and can be used for testing? I’ve been trying to create an image segmentation model that can run on the OpenMV H7 Plus to test the function. I’m attempting to develop the model using a Unet architecture and simplifying it as much as possible. So far, I’ve managed to create a model that weighs 1.9 MB (in int8). It loads into the camera without compilation errors, but it takes around 10 seconds for the camera to execute tf.segment, and the resulting image doesn’t seem to be properly segmented (using the function img.blend()). I suspect the model is too complex to work well on the camera. I would be grateful if anyone knows of any projects that have achieved an efficient segmentation model compatible with the OpenMV H7 Plus camera for testing the tf.segment() function.

Best regards

Hi, 10 seconds seems about right for any hi-end model. If you shrink the window size to like 96x96 it will go faster.

The segmentation method output does this:

[h][w][c] → For each c output an Image(w, h) with c[i].

So, if your model output is not that then it won’t help. Please note we have a new PR that will just allow you do to whatever you want with CNNs about to be merged: modules/py_tf: Add generic CNN processing support. by kwagyeman · Pull Request #2227 · openmv/openmv (github.com)

WIth that PR you can just write a callback to handle the image output in whatever way you want.

Thank you for your response!

Regarding the new pull request (modules/py_tf: Add generic CNN processing support), when will it be available for testing? Also, will there be a firmware update that includes these new methods? That would be really helpful, as I am quite new to the OpenMV camera and embedded systems in general, and I’m not sure I can properly modify the firmware version on my own.

Thanks again for your help!

Best regards

We’re going through the PR review process now.

Alright, thank you very much. Returning to the tf.segment() function, I’ve been experimenting with the .tflite model for Unet-based segmentation. The input dimensions of the model are (1,96,96,3), and the output is the same. As you mentioned, this setup wouldn’t be valid for using tf.segment(). I’ll need to either find a model with input and output dimensions of (96,96,3), or figure out how to transform the input images by adding that first dimension and removing it from the model’s output. As far as I know, the additional dimension is a convention in Keras to ensure compatibility with batch data.

On another note, how does the tf.segment function work internally with the images to pass them to the model and make a prediction? I ask because when I test the model outside the OpenMV camera, I have to use a series of functions like Keras’s img_to_array() before making a prediction. Therefore, I’m not sure if even after correcting the dimensions, the model would be valid for directly using the tf.segment function.

Apologies for the beginner-level questions, and thank you in advance for for your help.

Best regards,

Hi, the batch size is ignored. If we see that it’s 1 we just skip it. Your model does output something that should work.

In the case of your output, you’d expect 3 grayscale images to be output that would be 96x96x1 each. Segment would map the value in each channel to a pixel in the final output image.

On the OpenMV Cam, we do everything to handle inputting an image for you. However, we don’t know the scale of the image and/or the mean/stdev for standardization.

The PR that’s currently in review will add support to control these. The issue you are having is most likely that you are not trained to accept an input array of 0-1 pixel values in floating-point format. The current lib just handles that. With the new PR we can support more types.

Thank you very much, your response has been incredibly helpful. Indeed, the issue was as you described, and it’s now fixed. The camera is now producing segmentation masks correctly with an inference time of 14 seconds for 96x96 images. I’m trying to reduce that time but I think that the only way might be to use smaller images than 96x96 (considering that the model is in int8 and it is the smallest I obtained without loosing a lot of precision).

I’m looking forward to testing the PR you mentioned; it seems very useful for deep learning applications on the H7 Plus camera.

Thank you very much, and best regards.