Implementing low-cost pre-trained NN like MobileNet/SqueezeNet


Just wondering has anybody here managed to convert some of the popular low-cost NNs out there such as MobileNet_v1 or SqueezeNet_v1.0 to be implemented on the openMV M7? Understand that the accuracy will not be optimal, but we are looking at a quick proof-of-concept for such NN implementation on embedded devices for edge computing.

There is currently a heap size limitation on the M7 of up to ~50kB, so that pretty much restricts what kinds of network that can be loaded in. However, I would really appreciate if anyone can share if they managed to squeeze the network down to fit either via deeper compression/feature-reduction techniques etc.



Hi, right now CMSIS-NN doesn’t support any of these network types. So, we can’t support them. If ARM puts more effort behind this them maybe. I’ve been besieged by admin issues and have been unable to release some work I did a few month earlier but we have network classifiers working nicely. E.g. something like running chars74K and etc.

Hey kwabena, great hearing from you.

I found the git repo links to some of the additional nn examples that you were mentioning about that was recently published. In particular, I was interested in the INRIA people detection example that you have.

I’ve downloaded the NN file into the M7 cam and was trying to run it on your script. However I’m getting some errors when loading the network. in particular at the line “max_idx = out.index(max(out))” I’m getting a “ValueError: arg is an empty sequence”. I’ve tried to reset the board a couple of times, and at other times running to “out = net.forward(img, softmax=True)” line crashes and resets the entire system. I’m suspecting that there may be some errors involving the .network file that is loaded into the system.

Just want to check with you whether did you encountered any of the errors before?

Thank you very much for your help!

The INRIA network doesn’t work. Only the CHARS74K one. The car detection ones do work technically but were overfit. The INRIA one doesn’t work yet because I ran into an issue with our code being able to handle non-square images. I have to find more time to put into fixing this network.

Which portion of the code is generating the issue? Is it the cmsis.nn functions or in the training phase when generating the .nerwork file? If you don’t mind sharing more info on the problem, I’m all in to help resolve the issue! :relaxed: Let me know how can I help.


Um, from what I understand Ibrahim put in a fix to make the non-square network work but he told me that the network doesn’t perform still. On the desktop I got a rather high training and test score so it should be okay for the hardware. The test score was still high after conversion. I really just have not had time to work on OpenMV lately. My day job is really eating all my time.

Hi kwabena, just a question on the memory heap onboard the M7. Not too sure has this been raised before.

I was wondering of instead of storing the network weights and coefficients onboard the MCU heap, which is limited in size, is it possible to store it on an external SRAM instead? I presume the network parameters do not require very high speed retrieval that’s why it can be placed on the heap. Just a thought for future development, can we implement say a parallel FMC to external SRAM for nn network storage so that we do not constraint ourselves the MCU internal heap?



Hi, if you want to develop a board design that does that can you do so. We don’t have external RAM on board because it would be cost-prohibitive. External SDRAM would take an 8-layer board, require a smaller form factor H7 chip, and a host of other things. The H7 (our next system) is actually a lot less profitable than the M7. Our margins are going down on it. In particular due to the removable camera system. However, we added that feature to ensure the system would have expanded use cases and thus more sales.

That said, you could do this in the same form factor we have now. But, OpenMV would need to be producing units in much higher volumes than we are currently for it to make sense. The STM32 line of chips don’t have nice SDRAM controllers. They can only interface with SDR SDRAM which is quite expensive and not cheap compared to DDR3 SDRAM. It would be awesome if HyperBus RAM was supported by the STM32. This is like SDR RAM but comes in a very small package and only needs 11-pins versus 40+ I/O pins.

That all above said, the software changes once you add SDRAM would not be a lot. You’d just need to make a heap allocator for the RAM which you could get online from someones malloc implementation and then just change the alloc calls in our code to target that malloc file which is targeting the SDRAM.

Thanks Kwabena,

So its possible! Cool! Understand the implications for making the product scalable and balancing profit-margins as well. It’s never an easy decision to make.

For the case of nice SDRAM controllers for STM32s, I would think that the worst case scenario would be to go with a simple serial SRAM interface via SPI, but that would probably penalize the throughput further and bring down the overall FPS.

Alright thanks so much for taking the time share with me. Do keep me informed should you guys managed to resolve the issue with the INRIA human detection code. :smiley:


We really can use all the help we can get with the OpenMV Cam project. If you’d like to try to make an SDRAM version of the board please go ahead.

As for the INRIA model. I think we fixed the issue with the system. I just haven’t had time to revisit it. I’m in the middle of doing library upgrades right now and adding sorely missed features and plan to continue doing this until the H7 launch.

Oh so its fixed? Cool I can try it out and test it. Is the network model updated in your Git repo? If it is, I’ll go over and download the files and try once more.



I haven’t looked at this in a few months. I don’t know if it works or not.