Quantized model on OpenMV Cam H7 Plus

zdrisk · December 2, 2024, 5:41pm

Hi,

I am hoping to run an object detection model on the OpenMV Cam H7 Plus. I am trying to use SSD mobilenet v2 fpnlite 320. After training this model, it is 11.5 MB as .tflite but by quantizing it I can reduce the size to 3.7 MB. This is smaller than the 4.2 MB available on the heap for the H7 Plus. However, when I try to load the model onto the device I get the error Failed to load model: Failed to allocate tensors. By rebuilding the firmware and using -release_with_logs.a I was able to get the error tflm_backend: Failed to get registration from op code CUSTOM. After looking through some forums, I made sure to quantize using this code:

converter = tf.lite.TFLiteConverter.from_saved_model('path_to_model/saved_model/')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.target_spec.supported_types = [tf.int8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
tflite_model = converter.convert()

The problem persisted when quantizing this way. I am able to successfully detect objects in test images with the quantized model on my machine. The code I am using to load the model onto the camera is very simple:

import sensor, image, time, os, ml, math, uos, gc
from ulab import numpy as np

sensor.reset()                         # Reset and initialize the sensor.
sensor.set_pixformat(sensor.RGB565)    # Set pixel format to RGB565 (or GRAYSCALE)
sensor.set_framesize(sensor.HVGA) # Modify as you like.
sensor.set_windowing((320, 320))
sensor.skip_frames(time=2000)          # Let the camera adjust.


print(f"Model size = {uos.stat('detect_quant.tflite')[6]}")
print(f"Available space = {gc.mem_free() - (64*1024)}")
try:
    net = ml.Model("detect_quant.tflite", load_to_fb=uos.stat('detect_quant.tflite')[6] > (gc.mem_free() - (64*1024)))
    print('Successfully loaded model')
except Exception as e:
    print('Failed to load model: ' + str(e))

Do you have any idea what the unsupported custom OP might be? Is there anything I can change in the way that I train or quantize the model to make it able to be loaded onto OpenMV Cam H7 Plus?

kwagyeman · December 3, 2024, 3:56am

Load your model here: Netron

And then cross check the ops used and make sure they are in: openmv/src/lib/tflm/tflm_backend.cc at master · openmv/openmv

zdrisk · December 3, 2024, 4:07pm

All of the ops are supported except for the output layer which uses TFLite_Detection_PostProcess. It looks like this function is commented out in tlfm_backend.cc. Do you know of any workarounds for this?

kwagyeman · December 3, 2024, 7:35pm

Hi, since you are compiling the firmware just enable it: openmv/src/lib/tflm/tflm_backend.cc at master · openmv/openmv

Otherwise you to want to shave that operator off the network and do it in Python code like this: openmv/scripts/libraries/ml/ml/postprocessing.py at master · openmv/openmv

zdrisk · December 4, 2024, 7:35pm

Thanks. By recompiling the firmware, and using load_to_fb=True I was able to successfully load the model. However, I have one more question for you. When I try to feed an image into the network I get ValueEror: unexpected tensor shape. I am unsure what the issue is, because when I print the input tensor shape and the image dimensions they seem to line up. Is there a way to get more explicit feedback on which dimensions did not align? One possibility is that it might expect images in RGB888 format rather than RGB565. I will attach the python code I am running on the device, as well as jupyter notebook I used for training if you want more information.
detect_mobilenet.py (1.6 KB)
train_tflite.zip (6.4 KB)

kwagyeman · December 4, 2024, 8:03pm

Can you just print() the model itself and post that info here. This will list the tensors. Note that we support up to 4D tensors.

Preprocessing for the image input is here: openmv/scripts/libraries/ml/ml/preprocessing.py at master · openmv/openmv

We handle the RGB565 to RGB888 conversion.

That error is from here: openmv/src/omv/modules/py_ml.c at master · openmv/openmv

Something is going to wrong in either the input or output tensor shape. The error implies either the input or output tensor has no dims.

Model dims are captured here: openmv/src/lib/tflm/tflm_backend.cc at master · openmv/openmv

We pretty much grab all the tensor info from the model directly from tensorflow. Anyway, seeing the model tensor shapes will help debug this.

zdrisk · December 4, 2024, 8:18pm

Here is the model printed from the H7 Plus:

{ model_size: 3741072, model_addr: 0xc1c6ea6c, ram_size: 3566032, ram_addr: 0xc0017780, input_shape: ((1, 320, 320, 3),), input_scale: (0.00392157,), input_zero_point: (0,), input_dtype: ('B',), output_shape: ((), (), (), ()), output_scale: (1.0, 1.0, 1.0, 1.0), output_zero_point: (0, 0, 0, 0), output_dtype: ('f', 'f', 'f', 'f') }

It looks like the output shape has no dimensions. If I print the input and output details of the model when it is loaded in TensorFlow Interpreter on my machine, it looks like this:

input details = [
	{'name': 'serving_default_input:0', 
	'index': 0, 
	'shape': array([  1, 320, 320,   3], dtype=int32), 
	'shape_signature': array([  1, 320, 320,   3], dtype=int32), 
	'dtype': <class 'numpy.uint8'>, 
	'quantization': (0.003921568859368563, 0), 
	'quantization_parameters': {
	'scales': array([0.00392157], dtype=float32), 
	'zero_points': array([0], dtype=int32), 
	'quantized_dimension': 0}, 
	'sparsity_parameters': {}}]


output details = [

	{'name': 'StatefulPartitionedCall:1', 
	'index': 383, 
	'shape': array([ 1, 10], dtype=int32), 
	'shape_signature': array([ 1, 10], dtype=int32), 
	'dtype': <class 'numpy.float32'>, 
	'quantization': (0.0, 0), 
	'quantization_parameters': {
	'scales': array([], dtype=float32), 
	'zero_points': array([], dtype=int32), 
	'quantized_dimension': 0}, 
	'sparsity_parameters': {}}, 

	{'name': 'StatefulPartitionedCall:3', 
	'index': 381, 
	'shape': array([ 1, 10,  4], dtype=int32), 
	'shape_signature': array([ 1, 10,  4], dtype=int32), 
	'dtype': <class 'numpy.float32'>, 
	'quantization': (0.0, 0), 
	'quantization_parameters': {
	'scales': array([], dtype=float32), 
	'zero_points': array([], dtype=int32), 
	'quantized_dimension': 0}, 
	'sparsity_parameters': {}}, 

	{'name': 'StatefulPartitionedCall:0', 
	'index': 384, 
	'shape': array([1], dtype=int32), 
	'shape_signature': array([1], dtype=int32), 
	'dtype': <class 'numpy.float32'>, 
	'quantization': (0.0, 0), 
	'quantization_parameters': {
	'scales': array([], dtype=float32), 
	'zero_points': array([], dtype=int32), 
	'quantized_dimension': 0}, 
	'sparsity_parameters': {}}, 

	{'name': 'StatefulPartitionedCall:2', 
	'index': 382, 
	'shape': array([ 1, 10], dtype=int32), 
	'shape_signature': array([ 1, 10], dtype=int32), 
	'dtype': <class 'numpy.float32'>, 
	'quantization': (0.0, 0), 
	'quantization_parameters': {
	'scales': array([], dtype=float32), 
	'zero_points': array([], dtype=int32), 
	'quantized_dimension': 0}, 
	'sparsity_parameters': {}}]

kwagyeman · December 5, 2024, 1:08am

Hmmm, our library isn’t able to handle that output at all. However, it may be something I can easily parse. It appears to be a 4 tensor output. The library should have handled that though, so, there’s something we can improve in our code.

Okay, I need this from you:

An input image to test with. Best if it’s the exact size of the input to the model.
The exact expected values of the output tensors.

I can then update our backend to handle this model and run correctly.

Joe100A · December 5, 2024, 11:41am

Did you check this? Heads up custom Tflite models with Latest versions of Tensorflow

zdrisk · December 5, 2024, 4:14pm

Here are the output tensors:

output_tensor_1 = [[0.99609375 0.0078125  0.00390625 0.00390625 0.00390625 0.00390625
  0.00390625 0.00390625 0.00390625 0.00390625]]

output_tensor_2 = [[[ 0.23265359  0.41437462  0.83452284  0.58187926]
  [ 0.3753295   0.26537663  0.5947138   0.64145315]
  [-0.01361288  0.3794297   0.04515417  0.5357584 ]
  [ 0.07706149  0.55290985  0.14303254  0.71663904]
  [ 0.09968229  0.5476206   0.16877642  0.7164772 ]
  [ 0.12220028  0.54266745  0.19625843  0.71815974]
  [ 0.14538985  0.5432126   0.22415906  0.7187049 ]
  [ 0.17793639  0.5033666   0.2459734   0.66836286]
  [ 0.1711407   0.5456453   0.25113377  0.7184525 ]
  [ 0.03213066  0.60696185  0.38053498  0.67497635]]]

output_tensor_3 = [10.]

output_tensor_4 = [[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]

image_and_model.zip (2.5 MB)

kwagyeman · December 6, 2024, 6:45pm

Hi, I’ve made an issue tracker to work on this: Update tensorflow to handle post-processed model · Issue #2543 · openmv/openmv

However, I will not be able to get to it for a while. If you want to try to make the changes to the firmware yourself to make it work this would be appreciated and then I can review the PR.

The issue is the parsing in this function.

All headers for the library are here: libtflm/include at 73eb1aa07416cad0fd64a8405ca9eb5c4e878e87 · openmv/libtflm

zdrisk · December 9, 2024, 7:17pm

Did you mean to provide a link to the function that does the parsing? You only included a link to the headers.

kwagyeman · December 9, 2024, 7:29pm

That’s buried in the TensorFlow lite code. tensorflow/tflite-micro: Infrastructure to enable deployment of ML models to low-power resource-constrained embedded targets (including microcontrollers and digital signal processors).

Given the model loads the issue is just parsing of the headers.

zdrisk · January 16, 2025, 4:17pm

Any idea when you can get around to fixing this issue?

kwagyeman · January 16, 2025, 8:14pm

I’ll see what I can do this week.

kwagyeman · January 23, 2025, 5:57am

@zdrisk - I looked into this and something is wrong with the library loading your model.

typedef struct TfLiteTensor {
  // TODO(b/155784997): Consider consolidating these quantization fields:
  // Quantization information. Replaces params field above.
  TfLiteQuantization quantization;

  // Quantization information.
  TfLiteQuantizationParams params;

  // A union of data pointers. The appropriate type should be used for a typed
  // tensor based on `type`.
  TfLitePtrUnion data;

  // A pointer to a structure representing the dimensionality interpretation
  // that the buffer should have. NOTE: the product of elements of `dims`
  // and the element datatype size should be equal to `bytes` below.
  TfLiteIntArray* dims;

  // The number of bytes required to store the data of this Tensor. I.e.
  // (bytes of each element) * dims[0] * ... * dims[n-1].  For example, if
  // type is kTfLiteFloat32 and dims = {3, 2} then
  // bytes = sizeof(float) * 3 * 2 = 4 * 3 * 2 = 24.
  size_t bytes;

  // The data type specification for data stored in `data`. This affects
  // what member of `data` union should be used.
  TfLiteType type;

  // How memory is mapped
  //  kTfLiteMmapRo: Memory mapped read only.
  //  i.e. weights
  //  kTfLiteArenaRw: Arena allocated read write memory
  //  (i.e. temporaries, outputs).
  TfLiteAllocationType allocation_type;

  // True if the tensor is a variable.
  bool is_variable;
} TfLiteTensor;

Each tensor output head has this struct…

However, I’m reading this from the tensorflow lite library:

params.scale = 1
params.zero_point = 0
dims->size = 0
bytes = 4
type = float32
allocation_type = kTfLiteArenaRw
is_variable = false

The bytes value for each tensor head is set to 4 and zero for the dims size. From this, I can only gather that the library, while it didn’t throw an error, failed to actually load the model as it has not allocated a tensor output for any of the model outputs.

We build the library using this script here " libtflm/tools/ci.sh at main · openmv/libtflm" from this commit: libtflm/.github/workflows/tflm.yml at main · openmv/libtflm.

As for how to fix this. You’re going to need to bring up the issue of running this model to the libtflm folks: tensorflow/tflite-micro: Infrastructure to enable deployment of ML models to low-power resource-constrained embedded targets (including microcontrollers and digital signal processors).

The library should have returned an error about allocating tensors versus passing and returning 0 sized dims.

kwagyeman · January 23, 2025, 6:03am

Can you retrain with YOLOV2/V5? We have already verified these work on the system.

iabdalkader · January 23, 2025, 8:16am

I disabled this op to save some flash, if I remember correctly, it’s ~12KBs. I can enable this with the next release, once we have romfs we’ll have extra free flash for all MCUs.

As for the issue, it doesn’t sound like something specific to our implementation. I’d report an issue as suggested, they’re very responsive. What I can do is update libtflm once this is fixed upstream.

Topic		Replies	Views
Loading a tflite model into SDRAM instead of SRAM - H7 Plus OpenMV Boards	31	1349	June 15, 2023
Can't use Tensorflow-Lite for OpenMV OpenMV Boards	50	10874	June 9, 2020
OpenMV Firmware v4.5.6 and up TensorFlow Porting Guide OpenMV Boards tensorflow , tf , ml	32	1529	July 25, 2025
Process for building custom a model, training it, and deploying it to the OpenMV Cam H7 OpenMV Boards	22	12774	November 23, 2019
OpenMV Cam H7 Plus running Unoptimized (float32) model OpenMV Boards ml	3	11	July 15, 2025

Quantized model on OpenMV Cam H7 Plus

Related topics