Problem with imput type machine learning edge impulse

hi, my program recognizes the letters h, s, u, with machine learning, done with edge impulse, with data sets with dimensions of 96x96 in grayscale, I use an openmv h7 plus, he tells me that the data I provide him is not the same type he expects, this is the part of the program that gives me the error, and that makes the prediction
program:def predict(roi_img):
normalized_img = sensor.snapshot()
input_data = image.binary_to_grayscale()

classification_result = net.predict([input_data])

prediction_index = classification_result[0].index(max(classification_result[0]))  # Prendi l'indice della classe con il punteggio massimo
return labels[prediction_index]

plis, someone can helps me?

Please print(classification_result)

It’s going to be a list of ndarrays. See the updated image classification script: openmv/scripts/examples/03-Machine-Learning/00-TensorFlow/tf_image_classification.py at master · openmv/openmv

hello, now the program no longer gives errors; I did the machine learning with edge impulse: on the object detection section 50 cycles because otherwise it didn’t work, I used FOMO mobilnetV2 0.1, this thing gave me an f1 score of 99.7%; in the image section: there are numbers under raw features which are all 0x0, 0x0, 0x0, etc., all so the same I don’t know what this means, the color depth I put graysvcale since I only use black and white, even under the processed features entry is all 0.0000, 0.0000, etc.; on create impulse I put 96x96 images with fit longer axis, then I chose an object detection learning block but I don’t know if it’s as good as all the other parameters, I remind you that I want to recognize the black letters h, s and u on a white background with openmv h7 plus, on model testing it gave an accuracy greater than 97%, now my problem is that whatever I put in front of the camera detects the blob but identifies it as the background at 100%; I use a wide angle since my robot could be very close to the walls with letters, this is my code:

import sensor, image, time, ml, os

# Inizializza il sensore
sensor.reset()
sensor.set_pixformat(sensor.GRAYSCALE)  # Usa scala di grigi
sensor.set_framesize(sensor.QQVGA)      # Risoluzione QQVGA (160x120)
sensor.skip_frames(time=2000)           # Aspetta che il sensore si stabilizzi
clock = time.clock()

# Percorsi dei file
model_file = "/model/trained.tflite"
labels_file = "/model/labels.txt"

# Verifica l'esistenza dei file
def file_exists(filepath):
    directory = "/".join(filepath.split("/")[:-1])
    filename = filepath.split("/")[-1]
    try:
        return filename in os.listdir(directory)
    except OSError as e:
        print(f"Errore: {e}")
        return False

if not file_exists(model_file):
    raise OSError(f"File non trovato: {model_file}")
if not file_exists(labels_file):
    raise OSError(f"File non trovato: {labels_file}")

# Carica il modello e le etichette
model = ml.Model(model_file, load_to_fb=True)
with open(labels_file, 'r') as f:
    labels = f.read().splitlines()
    if not labels:
        raise ValueError("Il file delle etichette è vuoto.")

# Ciclo principale
while True:
    clock.tick()
    img = sensor.snapshot()
    # Disegna rettangoli bianchi per "rimpicciolire" l'area visibile
    img.draw_rectangle(0, 0, 160, 20, (255), fill=True)      # Bordo superiore
    img.draw_rectangle(0, 100, 160, 20, (255), fill=True)    # Bordo inferiore (modificato da 120 a 100)
    img.draw_rectangle(0, 20, 21, 120, (255), fill=True)     # Bordo sinistro
    img.draw_rectangle(139, 20, 21, 120, (255), fill=True)   # Bordo destro
    img.gamma_corr(contrast=1.5, brightness=-0.2)

    # Trova blob
    blobs = img.find_blobs([(0, 50, -50, 50, -50, 50)], pixels_threshold=20, area_threshold=20, merge=False)
    if blobs:
        largest_blob = max(blobs, key=lambda b: b.pixels())
        x, y, w, h = largest_blob.rect()

        # Disegna il blob
        img.draw_rectangle(largest_blob.rect())
        img.draw_cross(largest_blob.cx(), largest_blob.cy())

        # Ritaglia ROI
        roi = (x, y, w, h)
        cropped_img = img.copy(roi=roi)

        # Previsione
        prediction_result = model.predict([cropped_img])
        if prediction_result:
            # Combina etichette e punteggi
            scores = sorted(
                zip(labels, prediction_result[0].flatten().tolist()),
                key=lambda x: x[1],
                reverse=True
            )
            label, confidence = scores[0]
            print(f"Predizione: {label} (Confidenza: {confidence:.2f})")
        else:
            print("Errore: Il modello non ha prodotto alcun risultato.")
    else:
        print("Nessun blob trovato.")

Hi, when you crop the image to feed into the CNN this changes what it was trained on. CNNs do not necessary generalize. You need to 100% match the training data to what it will see in the real world unless you have an extremely larger number of samples.

Given you are cropping the image you run the detection on this is likely causing the mismatch.

  1. Make sure that the crops are some multiple of 96x96.
  2. Make sure that the training data is produced from those crops you are taking.

So, I’d use our dataset editor tool and update the default script it has with your cropping code.

I have a qqvga image resolution so it is impossible for me to apply a model since I have rectangular images, if the blob that I detect I always make it a square, is there a way to show only a square as an image like for example your smile detection model that you published on youtube in which the image was square? I always want to work with low square definitions since I have to recognize simple letters

Yes, if you crop the main frame buffer to the detected rect then it will show on the IDE.

ive tried but:

import sensor, image, time

sensor.reset()
sensor.set_pixformat(sensor.GRAYSCALE)  
sensor.set_framesize(sensor.QQVGA)    
sensor.skip_frames(time=2000)           
clock = time.clock()

while True:
    clock.tick()
    
    img = sensor.snapshot()

    blobs = img.find_blobs([(0, 50, -50, 50, -50, 50)], pixels_threshold=50, area_threshold=50, merge=True)

    if blobs:
        largest_blob = max(blobs, key=lambda b: b.pixels())
        x, y, w, h = largest_blob.rect()
        sensor.set_windowing((x, y, w, h))

        if w % 2 != 0:
            w -= 1
        if h % 2 != 0:
            h -= 1

    print("FPS:", clock.fps())
    time.sleep(0.1)

this program sometimes crops the already cropped image and sometimes returns the error: untimeError: Frame size is not supported or is not set. at the line that says: img = sensor.snapshot()
what am I doing wrong?

Don’t use set_windowing. That controls the frame capture from the camera.

Do:

largest_blob = max(blobs, key=lambda b: b.pixels())
img.crop(roi=largest_blob.rect())
1 Like

ok, i resolved this problem but now i need a cropped image resized to 25x25, if i use: img.resize(25, 25) Errore: ‘Image’ object has no attribute 'resize
how can i solve that?

There’s a scale function you can use instead image — machine vision — MicroPython 1.23 documentation

        img.scale(0.25, 0.25, roi=(square_x, square_y, size, size), rgb_channel=-1, alpha=256, color_palette=None, alpha_palette=None, hint=0, copy=False, copy_to_fb=False)

is it the correct form? how can i reduce the image resolution to 25x25?

Hi, you just specify the x_scale and y_scale arguments to scale or crop to produce a 25x25 image.

You can’t specify the resolution directly, just how much to scale by. Scaling is applied after cropping. So, the scale should be based off of the cropped image w/h that you want to scale to 25x25.

ok, i retrain the model with edge impulse with 4000+ images, that’s my final code: import sensor,

image, time, ml, os

# Inizializzazione della fotocamera
try:
    sensor.reset()  # Resetta la fotocamera
    sensor.set_pixformat(sensor.GRAYSCALE)  # Imposta il formato immagine in bianco e nero
    sensor.set_framesize(sensor.QQVGA)  # Imposta la risoluzione a QQVGA (160x120)
    sensor.skip_frames(time=2000)  # Aspetta 2 secondi affinché la fotocamera si configuri
    print("Fotocamera inizializzata con successo.")
except Exception as e:
    print("Errore durante l'inizializzazione della fotocamera:", e)
    raise

clock = time.clock()

# Percorsi dei file
model_file = "/model/trained.tflite"
labels_file = "/model/labels.txt"

# Carica il modello e le etichette
model = ml.Model(model_file, load_to_fb=True)
with open(labels_file, 'r') as f:
    labels = f.read().splitlines()

while True:
    clock.tick()

    try:
        # Acquisizione immagine
        img = sensor.snapshot()

        # Preprocessa l'immagine
        img.gamma_corr(contrast=1.5, brightness=-0.2)

        # Rilevamento dei blob
        blobs = img.find_blobs([(0, 50, -50, 50, -50, 50)], pixels_threshold=50, area_threshold=50, merge=True)

        if blobs:
            # Prendi il blob più grande
            largest_blob = max(blobs, key=lambda b: b.pixels())
            x, y, w, h = largest_blob.rect()

            # Calcola la dimensione del lato del quadrato
            size = max(w, h)  # Dimensione del lato del quadrato
            square_x = x - (size - w) // 2  # Centra in orizzontale
            square_y = y - (size - h) // 2  # Centra in verticale

            # Aggiusta per restare entro i limiti dell'immagine
            square_x = max(0, square_x)
            square_y = max(0, square_y)
            if square_x + size > img.width():
                square_x = img.width() - size
            if square_y + size > img.height():
                square_y = img.height() - size

            # Disegna il rettangolo intorno al quadrato
            img.draw_rectangle((square_x, square_y, size, size), color=255)  # Rettangolo bianco
            img.draw_cross(square_x + size // 2, square_y + size // 2, color=255)  # Croce bianca

            # Ritaglio dell'immagine come quadrato
            img.crop(roi=(square_x, square_y, size, size))  # Ritaglia il quadrato

            # Applica il modello a tutta l'immagine
            prediction_result = model.predict([img])
            if prediction_result:
                   # Combina etichette e punteggi
                scores = sorted(
                    zip(labels, prediction_result[0].flatten().tolist()),
                    key=lambda x: x[1],
                    reverse=True
                )
                label, confidence = scores[0]
                print(clock.fps(), "fps\t", "%s = %f\t" % (label, confidence))

        else:
            print("Nessun blob rilevato.")

    except Exception as e:
        print("Errore:", e)

but this is my serial terminal:
46.7293 fps background = 0.996094
46.7301 fps background = 0.996094
46.729 fps background = 0.996094
46.7297 fps background = 0.996094
46.7305 fps background = 0.996094
46.7293 fps background = 0.996094
46.7301 fps background = 0.996094
46.729 fps background = 0.996094
that never changes same problem than first, what should i do?

Hi, are the images you are capturing for training exactly the same as the images you are feeding the network when it’s running. Also, how big is the input into the network? 25x25 is less than the typical 96x96 input window.