Hi all,
In our latest firmware, 4.5.6, we’ve completely updated how our TensorFlow API works. This is a breaking change. But, the functionality increase is worth it. Here are the details:
The “tf” module is now gone (although there may be an alias to it for a while). You should now start using the new “ml” module. This will be where we keep any machine-learning code in the firmware. We decided to do the module renaming because as we support more MCUs with CNN accelerators (NPUs), it will not always be the case that TensorFlow is the execution engine under the hood. So, we opted to make the name generic moving forward in the future. This also lets us put other algorithms, as mentioned in this module too. That all said, the module right now just supports the TensorFlow lite for the microcontrollers framework.
Moving on, the new version of v4.5.6 has a number of substantial improvements that allow you to run all kinds of models.
-
We enabled pretty much Every Operator in TensorFlow on every OpenMV Cam model. Only the old OpenMV Cam M4 doesn’t have TensorFlow support onboard.
-
We updated to the latest version of TensorFlow. We will now track it directly versus using an old branch from Edge Impulse.
-
Model state is now stored in the heap versus on the frame buffer stack. This means that models now hold their state across inference calls. This is huge as you can now work with Models with memory onboard.
-
We massively increased the heap on all OpenMV Cam boards. On cameras with SDRAM support, the heap is now a couple of Megabytes in size, from 256KB. On ones without SDRAM we managed to find a few hundred more KB to add. We were able to do this thanks to a new MicroPython feature that lets you allocate Heap Blocks, which are located at different addresses (and thus in different SRAM/SDRAM locations). While this means that there are now multiple heap areas on each board. The heaps are all managed automatically by MicroPython as if there were one. Anyway, we had to increase the heap size so that the TensorArena, which holds the model state, could remain allocated across inference calls. The heap is now 8MB+ on the RT1062, 4MB+ on the H7 Plus, 8MB+ on the Pure Thermal, and 2.5MB+ on the Arduino Giga/Portenta! We may even increase the heap more on some boards too.
-
We enabled 4 dimension ndarrays (from 2, which was not that useful) using the ulab numpy module on every OpenMV Cam. This brings the power of numpy and vector processing to every OpenMV Cam along with ML support. As I’ll explain below, our embrace of numpy for data processing support is key. At first, I didn’t think going all in for numpy support made sense. Enabling fast 4D ndarray support costs a lot of flash space. But it’s the right decision, and it makes the impossible possible for MCUs programmed in Python. Get excited.
Cool, so let’s talk about the new Model() object. We understand that a massive breaking change to the TensorFlow API is a pain. So, when doing this refactoring work, we tried to future-proof it as much as possible so that folks don’t have to change all their code again. The new Model() object now supports Multi-input and Multi-output networks where each Tensor Input and Tensor Output can be up to 4 dimensions.
What does this mean? Well, let’s say you want to train a model that accepts images, voice samples, and accelerometer samples at the same time. You can feed that into the new ML module as three separate Input Tensors. And, if your model outputs YOLO like bounding boxes, scene descriptions, etc. you can handle these separate outputs.
The key to making this work is that each input and output Tensor is a numpy ndarray up to 4 dimensions. This lets us move the model post-processing code to Python, but with vector processing acceleration under the hood so that you can process the massive output vectors produced by machine learning models. For example, let’s say you have an array of 6300 output tuples where each tuple is (xmin, ymin, xmax, ymax, score)
and you want to select all columns of the output where the score is greater than 0.5. In Python, you have to write a for loop to do this, and it can literally take seconds if the code is not written carefully - not kidding However, using numpy, you can threshold each score value and get an array of the valid indices by doing
np.nonzero(np.asarray(output[:, 4] > 0.5))
which is executed in C under the hood at 50x the speed. This makes composing the pre/post-processing of tensors possible in Python.
Anyway, that’s why we made all the changes. You can run any model and pre/post-process the output in Python, which unblocks everyone from running whatever they want on an OpenMV Cam without having to write custom C code to do the work.
For more information about the new API please read the documentation: ml — Machine Learning — MicroPython 1.23 documentation (openmv.io). In the rest of this forum post I will walk you through how to port your Model in Edge Impulse to run using the new API.
Previously, we only supported Models generated by Edge Impulse; moving forward, bring whatever you want! Since you can pre-post process the input/output for your Model in Python using numpy, there’s no limit to what you can do.
However, given the API changes, here’s how you need to modify your EdgeImpulse code to work with the new API:
Current scripts generated by EdgeImpulse look like this (as of 7/25/2024):
# Edge Impulse - OpenMV Image Classification Example
import sensor, image, time, os, tf, uos, gc
sensor.reset() # Reset and initialize the sensor.
sensor.set_pixformat(sensor.RGB565) # Set pixel format to RGB565 (or GRAYSCALE)
sensor.set_framesize(sensor.QVGA) # Set frame size to QVGA (320x240)
sensor.set_windowing((240, 240)) # Set 240x240 window.
sensor.skip_frames(time=2000) # Let the camera adjust.
net = None
labels = None
try:
# load the model, alloc the model file on the heap if we have at least 64K free after loading
net = tf.load("trained.tflite", load_to_fb=uos.stat('trained.tflite')[6] > (gc.mem_free() - (64*1024)))
except Exception as e:
print(e)
raise Exception('Failed to load "trained.tflite", did you copy the .tflite and labels.txt file onto the mass-storage device? (' + str(e) + ')')
try:
labels = [line.rstrip('\n') for line in open("labels.txt")]
except Exception as e:
raise Exception('Failed to load "labels.txt", did you copy the .tflite and labels.txt file onto the mass-storage device? (' + str(e) + ')')
clock = time.clock()
while(True):
clock.tick()
img = sensor.snapshot()
# default settings just do one detection... change them to search the image...
for obj in net.classify(img, min_scale=1.0, scale_mul=0.8, x_overlap=0.5, y_overlap=0.5):
print("**********\nPredictions at [x=%d,y=%d,w=%d,h=%d]" % obj.rect())
img.draw_rectangle(obj.rect())
# This combines the labels and confidence values into a list of tuples
predictions_list = list(zip(labels, obj.output()))
for i in range(len(predictions_list)):
print("%s = %f" % (predictions_list[i][0], predictions_list[i][1]))
print(clock.fps(), "fps")
You should change the script to this:
# Edge Impulse - OpenMV Image Classification Example
import sensor, image, time, os, ml, uos, gc
from ulab import numpy as np
sensor.reset() # Reset and initialize the sensor.
sensor.set_pixformat(sensor.RGB565) # Set pixel format to RGB565 (or GRAYSCALE)
sensor.set_framesize(sensor.QVGA) # Set frame size to QVGA (320x240)
sensor.set_windowing((240, 240)) # Set 240x240 window.
sensor.skip_frames(time=2000) # Let the camera adjust.
net = None
labels = None
try:
# load the model, alloc the model file on the heap if we have at least 64K free after loading
net = ml.Model("trained.tflite", load_to_fb=uos.stat('trained.tflite')[6] > (gc.mem_free() - (64*1024)))
except Exception as e:
print(e)
raise Exception('Failed to load "trained.tflite", did you copy the .tflite and labels.txt file onto the mass-storage device? (' + str(e) + ')')
try:
labels = [line.rstrip('\n') for line in open("labels.txt")]
except Exception as e:
raise Exception('Failed to load "labels.txt", did you copy the .tflite and labels.txt file onto the mass-storage device? (' + str(e) + ')')
clock = time.clock()
while(True):
clock.tick()
img = sensor.snapshot()
predictions_list = list(zip(labels, net.predict([img])[0].flatten().tolist()))
for i in range(len(predictions_list)):
print("%s = %f" % (predictions_list[i][0], predictions_list[i][1]))
print(clock.fps(), "fps")
Here are the changes:
-
The
tf
module was replaced with theml
module. Also, I import numpy from the ulab module; this is not needed in the script per-say, but it will be necessary to call numpy functions. -
To load the ml module now you just directly create
Model()
object versus calling load. We’ve kept theload_to_fb
argument so that you can still load models onto the frame buffer stack if you are on an OpenMV Cam without SDRAM. You should not continue to use this argument unless you absolutely need it. Note that Model() objects automatically free their associated memory in the heap or whatever space they used on the frame buffer stack when deleted. -
Classify()
has been removed. The sliding window approach it used, while interesting, is unusable as it’s so slow. There is only one inference method now,predict()
which takes a list of input ndarrays or image objects and returns a list of ndarrays. The list of inputs must be the same size as the number of tensor inputs the model expects. -
On output
predict()
returns the list of ndarrays equal to the number of tensor outputs. For the classification model it accepts one input (an image) and outputs one tensor (the list of class scores). So, we have to do[0]
to grab the single output tensor. -
The output tensor of the classification model has a shape of (1, x) where x is the number of classes you are looking at. If you try to convert this into a list directly from an ndarray you’ll get two levels of lists
[[..., ..., ..., .etc]]
. So, you have to flatten it first (e.g. remove the extra dimensions), before converting the ndarray into a list of floats you can pass to the enumerate method in Python.
As you can see, the new API is actually less code than the previous API.
Question: I thought the predict()
call only accepted ndarrays?
Answer: Yes, it does. However, in Python we automatically detect when image objects are passed to predict()
and convert them on to ndarrays on the fly for you using the Normalization object. This is a pre-processing class we made for handling images. The new API allows you to add pre-processing classes for whatever you like. Note that the Normalization
leverages fast C code under the hood to convert the image to an ndarray. So, you should not notice any loss in speed.
Moving on, for object detection the current EdgeImpulse code looks like this (as of 7/25/2024):
# Edge Impulse - OpenMV Object Detection Example
import sensor, image, time, os, tf, math, uos, gc
sensor.reset() # Reset and initialize the sensor.
sensor.set_pixformat(sensor.RGB565) # Set pixel format to RGB565 (or GRAYSCALE)
sensor.set_framesize(sensor.QVGA) # Set frame size to QVGA (320x240)
sensor.set_windowing((240, 240)) # Set 240x240 window.
sensor.skip_frames(time=2000) # Let the camera adjust.
net = None
labels = None
min_confidence = 0.5
try:
# load the model, alloc the model file on the heap if we have at least 64K free after loading
net = tf.load("trained.tflite", load_to_fb=uos.stat('trained.tflite')[6] > (gc.mem_free() - (64*1024)))
except Exception as e:
raise Exception('Failed to load "trained.tflite", did you copy the .tflite and labels.txt file onto the mass-storage device? (' + str(e) + ')')
try:
labels = [line.rstrip('\n') for line in open("labels.txt")]
except Exception as e:
raise Exception('Failed to load "labels.txt", did you copy the .tflite and labels.txt file onto the mass-storage device? (' + str(e) + ')')
colors = [ # Add more colors if you are detecting more than 7 types of classes at once.
(255, 0, 0),
( 0, 255, 0),
(255, 255, 0),
( 0, 0, 255),
(255, 0, 255),
( 0, 255, 255),
(255, 255, 255),
]
clock = time.clock()
while(True):
clock.tick()
img = sensor.snapshot()
# detect() returns all objects found in the image (splitted out per class already)
# we skip class index 0, as that is the background, and then draw circles of the center
# of our objects
for i, detection_list in enumerate(net.detect(img, thresholds=[(math.ceil(min_confidence * 255), 255)])):
if (i == 0): continue # background class
if (len(detection_list) == 0): continue # no detections for this class?
print("********** %s **********" % labels[i])
for d in detection_list:
[x, y, w, h] = d.rect()
center_x = math.floor(x + (w / 2))
center_y = math.floor(y + (h / 2))
print('x %d\ty %d' % (center_x, center_y))
img.draw_circle((center_x, center_y, 12), color=colors[i], thickness=2)
print(clock.fps(), "fps", end="\n\n")
For object detection there are a lot more changes as this requires custom post-processing in Python:
# Edge Impulse - OpenMV Object Detection Example
import sensor, image, time, os, ml, math, uos, gc
from ulab import numpy as np
sensor.reset() # Reset and initialize the sensor.
sensor.set_pixformat(sensor.RGB565) # Set pixel format to RGB565 (or GRAYSCALE)
sensor.set_framesize(sensor.QVGA) # Set frame size to QVGA (320x240)
sensor.set_windowing((240, 240)) # Set 240x240 window.
sensor.skip_frames(time=2000) # Let the camera adjust.
net = None
labels = None
min_confidence = 0.5
try:
# load the model, alloc the model file on the heap if we have at least 64K free after loading
net = ml.Model("trained.tflite", load_to_fb=uos.stat('trained.tflite')[6] > (gc.mem_free() - (64*1024)))
except Exception as e:
raise Exception('Failed to load "trained.tflite", did you copy the .tflite and labels.txt file onto the mass-storage device? (' + str(e) + ')')
try:
labels = [line.rstrip('\n') for line in open("labels.txt")]
except Exception as e:
raise Exception('Failed to load "labels.txt", did you copy the .tflite and labels.txt file onto the mass-storage device? (' + str(e) + ')')
colors = [ # Add more colors if you are detecting more than 7 types of classes at once.
(255, 0, 0),
( 0, 255, 0),
(255, 255, 0),
( 0, 0, 255),
(255, 0, 255),
( 0, 255, 255),
(255, 255, 255),
]
threshold_list = [(math.ceil(min_confidence * 255), 255)]
def fomo_post_process(model, inputs, outputs):
ob, oh, ow, oc = model.output_shape[0]
x_scale = inputs[0].roi[2] / ow
y_scale = inputs[0].roi[3] / oh
scale = min(x_scale, y_scale)
x_offset = ((inputs[0].roi[2] - (ow * scale)) / 2) + inputs[0].roi[0]
y_offset = ((inputs[0].roi[3] - (ow * scale)) / 2) + inputs[0].roi[1]
l = [[] for i in range(oc)]
for i in range(oc):
img = image.Image(outputs[0][0, :, :, i] * 255)
blobs = img.find_blobs(
threshold_list, x_stride=1, y_stride=1, area_threshold=1, pixels_threshold=1
)
for b in blobs:
rect = b.rect()
x, y, w, h = rect
score = (
img.get_statistics(thresholds=threshold_list, roi=rect).l_mean() / 255.0
)
x = int((x * scale) + x_offset)
y = int((y * scale) + y_offset)
w = int(w * scale)
h = int(h * scale)
l[i].append((x, y, w, h, score))
return l
clock = time.clock()
while(True):
clock.tick()
img = sensor.snapshot()
for i, detection_list in enumerate(net.predict([img], callback=fomo_post_process)):
if i == 0: continue # background class
if len(detection_list) == 0: continue # no detections for this class?
print("********** %s **********" % labels[i])
for x, y, w, h, score in detection_list:
center_x = math.floor(x + (w / 2))
center_y = math.floor(y + (h / 2))
print(f"x {center_x}\ty {center_y}\tscore {score}")
img.draw_circle((center_x, center_y, 12), color=colors[i])
print(clock.fps(), "fps", end="\n\n")
Okay, here’s what’s going on:
-
Like the object detection code as before you need to change the tf module to the ml module, import ulab if you plan to use numpy stuff, and change
tf.load
toml.Model
. -
After that we add a custom post-processing function to handle the output of the FOMO model. Previously
detect()
ran all this logic in C. While this was sweet, it meant that precious firmware space was used on all OpenMV Cams for a baked indetect()
method which may not be what you want to use. Now the post-processing is in Python. We can do this through thecallback
argument built intopredict()
. Let’s walk through the code:
# The post-processing callback will receive the model object, the input list, and the output list.
# We designed it this way so that this callback function could be included in a library in the future
# that you load as a module. Like "import fomo_post_processing from ei".
def fomo_post_process(model, inputs, outputs):
# This unpacks the single output tensor from FOMOs shape.
# It has batches (1), height, width, and channels for each object class.
ob, oh, ow, oc = model.output_shape[0]
# For image arguments to predict() the roi of the image being processed is available.
# We get it by grabbing the single input at [0] and getting the ROI object there.
x_scale = inputs[0].roi[2] / ow
y_scale = inputs[0].roi[3] / oh
# This code computes the x/y scale difference between the ROI and the output
# tensors width/height. We can map back to the input image using this.
scale = min(x_scale, y_scale)
# In the case the input image gets cropped when given to the model input
# We need to compute the x/y offset to map it back (and the ROI offset).
x_offset = ((inputs[0].roi[2] - (ow * scale)) / 2) + inputs[0].roi[0]
y_offset = ((inputs[0].roi[3] - (oh * scale)) / 2) + inputs[0].roi[1]
# Create a list of lists for each class output.
l = [[] for i in range(oc)]
# FOMO outputs an activation map for each class.
for i in range(oc):
# The image object now supports creating images from ndarrays. The code below is
# like magic. What's happening is that we are selecting the output tensor [0], and there's
# only 1 output tensor for FOMO. Then we grab batch 0 (the only one), every pixel
# of the height and width dimensions and the target class we are looking for. This shows
# shows the power of ndarrays. The below code is doing a very complex operation, but,
# in one line. Note that the array is sliced using numpy. So, no copy is made to create the
# array slice below. Finally, the output array is an array of floats (0 to 1). We need to make
# if (0 to 255) to create a GRAYSCALE image. So, we multiply all pixels by 255 and then
# cast it to an image. The image lib will automatically interpret (h, w) ndarrays as
# GRAYSCALE images
img = image.Image(outputs[0][0, :, :, i] * 255)
# Next, find blobs above the threshold value.
blobs = img.find_blobs(
threshold_list, x_stride=1, y_stride=1, area_threshold=1, pixels_threshold=1
)
# Then for all the blobs found...
for b in blobs:
rect = b.rect()
x, y, w, h = rect
# Extract the brightness of the pixels in the blob to create a score.
score = (
img.get_statistics(thresholds=threshold_list, roi=rect).l_mean() / 255.0
)
# And then map the blobs back to the input image.
x = int((x * scale) + x_offset)
y = int((y * scale) + y_offset)
w = int(w * scale)
h = int(h * scale)
# And add them back to their score list.
l[i].append((x, y, w, h, score))
# Return a list of classes which each have a list of (x, y, w, h, score).
return l
Wow! That was a lot that detect()
used to do. But, now, with the code in Python, if you need to run a modified model with a slightly different output than detect()
had, you aren’t out of luck anymore. You can change what’s going on during post-processing now.
- Finally, the rest of the code looks very similar to how we processed the output of
detect()
.
# Predict() just returns whatever the post-processing method wants to return.
for i, detection_list in enumerate(net.predict([img], callback=fomo_post_process)):
if i == 0: continue # background class
if len(detection_list) == 0: continue # no detections for this class?
print("********** %s **********" % labels[i])
# This is the output we added to each list back in the post-processing method.
for x, y, w, h, score in detection_list:
center_x = math.floor(x + (w / 2))
center_y = math.floor(y + (h / 2))
print(f"x {center_x}\ty {center_y}\tscore {score}")
img.draw_circle((center_x, center_y, 12), color=colors[i])
But now, it is more Pythonic without the weird output
class that was just a named tuple.
Note, you may notice this alternative version of the fomo_post_process()
in our examples:
def fomo_post_process(model, inputs, outputs):
n, oh, ow, oc = model.output_shape[0]
nms = NMS(ow, oh, inputs[0].roi)
for i in range(oc):
img = image.Image(outputs[0][0, :, :, i] * 255)
blobs = img.find_blobs(
threshold_list, x_stride=1, area_threshold=1, pixels_threshold=1
)
for b in blobs:
rect = b.rect()
x, y, w, h = rect
score = (
img.get_statistics(thresholds=threshold_list, roi=rect).l_mean() / 255.0
)
nms.add_bounding_box(x, y, x + w, y + h, score, i)
return nms.get_bounding_boxes()
We designed an NMS object to take care of the details of dealing with overlapping bounding boxes and mapping detections from the output tensor back to the input image. However, this class was designed before we made the decision to switch to ndarrays for everything. Given that, its API is not suitable for moving into the future and it will be refactored. You should avoid using it if you don’t want your code breaking again. It needs to be refactored as the current API can’t easily leverage numpy vectorization for larger models that output thousands of detections.
Thank you for reading this massive forum post. There is no doubt we need to fix bugs that folks will uncover once they start testing things. We’ll get those fixed quickly. However, the API should be stable now.
Please ask questions.