Resize the image to give it as an input to neural network

I have OpenMV Cam H7
I have two questions regarding TFLite on openmv:-

  1. I built a model using Keras and quantized it using tflite, it has an accuracy of 96% after quantization.
    it takes grayscale images of size 28x28 pixels as an input, how to provide this input to the neural network in openMV. I want the image to scale properly to 28x28.

  2. If I build a model which works on grayscale images with values between 0 and 1 rather than 0-255, how to give this as an input in OpenMV, as OpenMV takes grayscale images but I want them to have a value between 0 and 1. is there a way to multiply each pixel by 1/255?


    Thanks

Our TF module takes cares of all these details for you. You just need to load the network and run it.

  1. Out TF module does this, it will crop and scale the input image to whatever the network takes while maintaining aspect ratio.

  2. Our TF module scales the inputs and outputs of the network to be what is needed. However, if a pixel is between 0-1 it should be a float.

We recommend using Edge Impulse to generate and train your network online.

Hello, digging up this topic to get an understanding of how these TF module functions work. It is awesome to read that the TF module handles a lot for us (but confusingly not documented in tf.classify help). Lately I have been using an EdgeImpulse-generated module to detect a black sponge and a green USB reader on a whiteboard. I classify images of variable resolutions and aspect ratios.

While under-scaled (relative to the model’s 160 px resolution) input images significantly reduced model performance (as expected) and over-sampled images had no detectable effect (as expected from the above), surprisingly, non-rescaled images (543 * 502 - close to square proportions) have terrible (null) performance (opposed to the assumption that the TF module handles everything), even though they are close to the resolution of the images that I re-scaled to 480*480, for instance.
image

Results above are based on samples of ~20 live image classifications of the target. The model usually has 100% accuracy for classifying the green USB reader. I used nearest neighbor interpolation. This is a sample image of what I was trying to classify (a green USB flash reader on a whiteboard):
image

I attach my script here, based on the one exported by Edgeimpulse:

# Edge Impulse - OpenMV Image Classification Example

import sensor, image, time, os, tf, uos, gc

sensor.reset()                         # Reset and initialize the sensor.
sensor.set_pixformat(sensor.RGB565)    # Set pixel format to RGB565 (or GRAYSCALE)
sensor.set_framesize(sensor.WQXGA2)      # Set frame size to QVGA (320x240)
#sensor.set_windowing((240, 240))       # Set 240x240 window.
sensor.skip_frames(time=2000)          # Let the camera adjust.

net = None
labels = None
side_res = "original"

try:
    # load the model, alloc the model file on the heap if we have at least 64K free after loading
    net = tf.load("trained.tflite", load_to_fb=uos.stat('trained.tflite')[6] > (gc.mem_free() - (64*1024)))
except Exception as e:
    print(e)
    raise Exception('Failed to load "trained.tflite", did you copy the .tflite and labels.txt file onto the mass-storage device? (' + str(e) + ')')

try:
    labels = [line.rstrip('\n') for line in open("labels.txt")]
except Exception as e:
    raise Exception('Failed to load "labels.txt", did you copy the .tflite and labels.txt file onto the mass-storage device? (' + str(e) + ')')

clock = time.clock()
while(True):
    clock.tick()

    img = sensor.snapshot()

    # default settings just do one detection... change them to search the image...
    if (str(side_res) != "original"):
        img.scale(x_size=side_res,y_size=side_res,roi=(1340,1215,543,502))

    # MANUALLY comment/uncomment the following line for using original vs. re-scaled images
    #for obj in net.classify(img,roi=(1340,1215,543,502), min_scale=1.0, scale_mul=0.8, x_overlap=0.5, y_overlap=0.5):
    for obj in net.classify(img, min_scale=1.0, scale_mul=0.8, x_overlap=0.5, y_overlap=0.5):
        print("**********\nPredictions at [x=%d,y=%d,w=%d,h=%d]" % obj.rect())
        img.draw_rectangle(obj.rect())
        # This combines the labels and confidence values into a list of tuples
        predictions_list = list(zip(labels, obj.output()))
        with open('confidences.csv', 'a') as confidencelog:
                                confidencelog.write(str(predictions_list[1][1]) + ',' + str(predictions_list[1][0]) + ',' + str(side_res) + '\n')
        for i in range(len(predictions_list)):
            print("%s = %f" % (predictions_list[i][0], predictions_list[i][1]))

    print(clock.fps(), "fps")

After discussing with EdgeImpulse it seems the tf.classify function just feeds the image to the model and thus, we need to rescale ourselves. Could it be worth to clarify this regarding what the TF module does?

The tf function automatically bilinearly scales the input ROI (which is the whole image if not specified) into whatever size of the CNN accepts.

I see. How does the discrepancy in the above result appear, then?

Can you make a github issue tracker?