I’m working on a demo with the OpenMV H7 suing the tf (TensorFlow Lite) package. When I call
tf.classify(net, img), how is scaling or cropping performed on the image?
For example, I have a CNN trained on Fashion-MNIST with an input tensor of 28x28. I have a 240x240 window as my input image and ROI in OpenMV. When this image data is sent to tf.classify(), does tf.classify() automatically scale this image to match the required input tensor (28x28)? If so, what scaling technique is used (e.g. area-averaging)?
Next, tf.classify() still seems to work even when the input image is not the same aspect ratio. For example, I change the sensor to
sensor.set_windowing((120, 240)), and tf.classify() happily accepts this image data. From what I can tell, the image is scaled to match the smallest dimension, and the rest is cropped. So, the 120x240 image is scaled to 28x56, and the top 14 rows and bottom 14 rows are cropped out so that you’re left with a 28x28 px square image. Do I have this right?
Here is my code. Please note that “trained.tflite” is a CNN trained in Edge Impulse with Fashion-MNIST.
# Edge Impulse - OpenMV Image Classification Example import sensor, image, time, os, tf sensor.reset() # Reset and initialize the sensor. sensor.set_pixformat(sensor.GRAYSCALE) # Set pixel format to RGB565 (or GRAYSCALE) sensor.set_framesize(sensor.QVGA) # Set frame size to QVGA (320x240) sensor.set_windowing((120, 240)) # Set 240x240 window. sensor.skip_frames(time=2000) # Let the camera adjust. net = "trained.tflite" labels = [line.rstrip('\n') for line in open("labels.txt")] clock = time.clock() while(True): clock.tick() img = sensor.snapshot() # Default classify: perform one inference on whole image obj = tf.classify(net, img) predictions_list = list(zip(labels, obj.output())) print(max(predictions_list, key=lambda x: x)) print(clock.fps(), "fps")