cnn accuracy on m7


I’m using your M7 to run CNNs. Unfortunately I noticed that the performance of the models is not the same as during the test phase. For example, I trained a person recognition (binary classification (person in picture or not)) on the COCO database. The accuracy on the evaluation set is ~80%. As input I use color images of 128x128x3. As network I use a CNN with 4 layers. After quantization I loaded the model to the M7 and executed it. In very few cases the model recognizes people in the camera image. I.e. the recognition rate is bad.

I did the following experiments to find the error: First I loaded 100 images of people from the COCO database onto the microcontroller and processed them one after the other through the model. Of these 100 persons only 11 were recognized.

As a further test I used the quantized caffe model and wrote a small demo application. The pictures taken by my webcam serve as input. The recognition rate of this demo application is subjectively perceived much better. So there seems to be an error during the transformation from the quantized caffe model to the CMSIS code. Is there anything known? Any ideas where the error could be?


Hi, the code that runs the CNNs is fine. However, when transforming the image to a low res one that’s done using nearest neighbor if the resolution is set higher than the CNN model. Are you running the camera in 128x128 pixel mode? Also, I’d be concerned about the lighting of the camera image. We lower the exposure on the camera and overclock it slight to achieve higher FPS. However, you can force the exposure to be much longer use the set_auto_exposure() method which will drastically improve the image quality which can have adverse effects on the CNN.

One note, running the camera in 128x128 mode causes a lot of blur. You may just want to crop 128x128 off of a larger resolution picture.

Here’s how the image gets brought in.

If you have ideas to improve this code we’d love to get more feedback on it. ARM honestly hasn’t been tool helpful after releasing CMSIS-NN. I don’t know exactly if they are going to provide more support. I’ll know more in a few weeks. I’m going to this Tiny-ML conference in 2 weeks where I hear there’s going to be some big news.

Thanks for the response. As input sensor format I use VGA. From this picture I cut a window of 256 x 256. In addition, I set the exposure time manually. But that doesn’t seem to be the problem. As already mentioned, the model performance on images I read from the SD card is also bad. Here is the example of how I tested the recognition rate on data that was also present in the training material:

import sensor, image, time, pyb, time, os, nn

# load model
net = nn.load('/')
labels = ['no_person', 'person']

# Still need to init sensor
# Set sensor settings

# Set sensor pixel format

cnt = 0
for i in range(100):
    # Load image
    image_path = "/ppm_images/" + str(i) + "_1.ppm"
    img = image.Image(image_path, copy_to_fb=True)

    # Flush FB

    # Add a small delay to allow the IDE to read the flushed image.

    #img = sensor.snapshot()         # Take a picture and return the image.
    out = net.forward(img)
    max_idx = out.index(max(out))
    score = int(out[max_idx]*100)

    score_str = "%s:%d%% "%(labels[max_idx], score)

    if max_idx == 1:


Furthermore, I have noticed that when a person is detected, the confidence is not very high (~51%). Also, I don’t understand the sense behind the parameter softmax. If I set it to True, the output is wrong. The values do not add up to one. I don’t think the error is in the CMSIS library. I had already used them for other projects without any problems. When I find the time I also look at the code you sent me.

Are you sure the labels are not inverted ? 0 for person 1 for something else ? Note some accuracy is lost when the model is quantized, but shouldn’t be this bad.

Of course not…

Okay, can you provide everything we need to reproduce it? Ibrahim can test it out.

Hi, I’d like to add that I’ve been seeing the same thing. When running the CNN on the M7, I see poor results

Can you provide someway for us to reproduce the issue?

Can you provide the training and test dataset along with the labels and everything we need to quickly run it through caffe?

I just shared a Google Drive to you at your email:


  • “” (you might need to delete the MACOSX folder after unzipping.
  • the “examples” folder
  • trashscan_solver.prototxt
  • trashscan_train_test.prototxt
    You should already have the quantize and convert scripts. If not, they are on the drive too.

Note: the updated solver is on my other computer. All that is different is test_interval = 500, stepsize = 2500, max_iter = 10000

On my computer, I created a folder called “trashscan” and stored all these files inside. When running, I’m inside the “trashscan” directory. You shouldn’t get any path issues if you do that too. The rest of the steps are identical to your guide. However, if needed, I can provide them (there’s a word document in the drive with the steps but it isn’t updated with the paths and info…though, it explains which scripts to call and the order; only quantize and convert need arguments passed)

Additionally, I attached the code for my program on the M7 (7.53 KB)

Hi, can you re-send that? I accidentally deleted it because I wasn’t expecting it. Google just sent an email saying it shared a drive with me but there was no identifying info so it looked like a phishing attack.

You should have the link for it now.

I’m going to try different image qualities right now. I have the following setup:

sensor.reset()                      # Reset and initialize the sensor.
sensor.set_pixformat(sensor.RGB565) # Set pixel format to RGB565 (or GRAYSCALE)
#sensor.set_framesize(sensor.CIF)   # Set frame size to CIF (352x288)
#sensor.set_windowing((280, 288))    # Set window.
sensor.set_framesize(sensor.QVGA)   # Set frame size to QVGA (320x240)
sensor.set_windowing((160, 210))    # Set window.
sensor.skip_frames(time = 2000)     # Wait for settings take effect.
#sensor.set_auto_whitebal(False)     # Turn off white balance.

Do you have any suggestions on things I can try? Note: I have two extra fb so I needed to run at QVGA.

Not sure what you mean by your question.

Um, I will try to process this tomorrow since it should be a working feature. I’ll try to get to the step of verifing you were doing everything right and performance just fell on the M7. That will then give something for Ibrahim to fix.

My question was just to see if there was an ideal sensor setup for the CNN or other values I should try instead of what I did. For example, if I use the exposure, what should be the milliseconds I use?

Okay, thank you for looking into my process for the CNN. Please let me know if there’s anything else I need to provide you and please keep me informed with your results.

Thank you,

Not really, just expect that the CNN needs to see images that look like what you trained on. Deviation from this will make the net not work at all.

Okay perfect! I took photos with the M7 and test in the same environment with the same objects. This is how I could tell that there was a problem

Sorry for the late reply and thanks for the offer. I have sent you all relevant scripts by e-mail. First you have to download the COCO database:

Labels for Train and Validation:

The next step is to unpack the database and place it in a convenient location to memory. Next the script must be executed. In lines 98 and 99, change the corresponding path to the COCO database and the path to the location for the processed data. The script scales the images to 128 x 128, labels the data to categories 0 for no_person and 1 for peron. In addition, images in which people make up very little space of the image are removed. Finally, the data set will be balanced.

The next step is training. All relevant scripts can be found at cmsisnn/persondetection_fast/. The structure is similar to your example scripts. In the commands file you can find all relevant commands for training. Furthermore change the database path in persondetection_fast_train_test.prototxt. Last but not least I left the solverstate and the quantized model in the folder in case you want to start without a database. I also attached the normalization file.

If you want to test the quantized Caffe Model on the data of your webcam you can use the scripts, and To do this please change the path to the trained model in the script (lines 16, 17, 18)

In you can find the code for the M7.

Many thanks in advance.

I couldn’t send you an email with the rar. Therefore I added the rar here. I sent the password by mail.
CodePersonDetection.rar (514 KB)

Hi, I will get to looking at this. I’ve put the email on my desktop. Please note that I’m under a lot of work right now. I have to start Kickstarter shipping, release v3.3.0, and do taxes. While I’d like to give this a higher priority I have to focus on the other things first.

Eric, do you mind sending me your Matlab program to test the CNN? It would be great to test my CNN that way. If so, can you email me at