4.5.8 Memory Leaks and abnormal behavior (on custom hardware with identical H7 plus components )

My project runs quite with firmware up to build 4.5.5. (never used gc.collect)
After that I started to have to use gc.collect more often which is slowing things down.
With 4.5.8 I have to use gc.collect circa every 50 frames in post processing from recorded SDRAM frames.

The process in my project is : 1) take circa 1000 snapshots greyscale HQQQVGA saving each frame to SDRAM as per example script 2) seek frame 0 and then read every frame toextract features (building arrays of features based on image stats. 3) Mark frames to be calssified by ML 4) classify the subset of frames and return an array of classifed images.

Without using gc.collect very often , although I have pre-allocated memory (i.e. array sizes defined before extracting features, I read out corrupt frames from SDRAM and have random failures during part 2). Hard to debug because the error message are random when there are error messages.

Can you provide a simple script that produces the issue? We increased the heap massively in the lastest firmware. It was a few KB but now it’s in the MB range.

This is a bit concerning. As soon as you provide us with a test script we’re going to debug and fix this and, if necessary, do a patch release. We just need to be able to reproduce it first.

I’ll try my best in the next days.

Consider I have rpc SPI loaded with about 12 callbacks. I don’t know if that adds on to the issue. And I am using the newer build with older RPC.py because the newer RPC.py doesn’t work.

I don’t think RPC/SPI is related. There were major changes recently, that broke some things, we’ve fixed everything again in the latest release, but there’s always a chance that we’ve missed something.

Going to be more difficult than I thought.

I can only replicate in my device with
SPI callbacks that call a function that performs
→ snapshots recorded to sdram that passes in a stream to a function that performs
→ operations involving extraction of stats on certain round of images from the stream populating 4 arrays (int8 1000 elements each)
And crashes before can do other operations .

If I work with pre recorded .bin and launch the functions via script rather than callbacks and from there populate SDRAM etc no issues.

This leads me to think the problem lies in SPI or for some reason the function that acquires snapshots and saves to sdram. I can see proper images being out on the frame buffer displayed in the IDE though.

What would you suggest I do to debug? I’ll keep trying to recreate with a simple script I can share here but chance are slim I’ll manage.

Again the script was working through all the releases prior to 4.5.6 that I used. For some years now so I’m quite confident my code is stable although it could be it has issues that are only evident through recent changes you made.

Mmm, I would start by just adding print statements everywhere and delays after each one of like 4ms. Then, if you click on the IDE FPS label you can change the poll rate to 1000KHz for grabbing the text buffer. Then run your code. You should be able to catch a pretty decent log before the crash.

Cool will try that really helps a lot.

For memory related info to print . I only know gc.mem_free mem_alloc. Anything else?

It’s more about making a trace of what’s happening and using that to determine what is causing the crash. Afterwards, you can try removing that and seeing if it doesn’t crash. If this is the case then it’s easy to determine the root cause.

image
Tried setting text polling to 1 and 1000 (which is correct?). Delays even up to 10ms. Crashes in random points and print statements have coherent values (i.e. H and W of image read from SDRAM). No error messages. Just prints last message that it was scripted to print.

mem_alloc - mem free
starts
98240 - 4239040
and creaps up to this before crashing (just populating arrays with pre-defined size)
139040 -4198240

running gc.collect() every loop with the prints and delays leads to same result with almost constant memory alloc
67616
4269664

Any other suggestions?

Loaded latest dev release . Same identical files . The SPI call back I use to trigger the function that is crashing doesn’t even start when it is supposed to.

Hi, is it possible for you to use a different RPC callback interface that SPI? Like UART? The pyb.SPI interface does direct DMA access. There could be possible issues with that as it will only work when targeted at RAM that’s located in certain areas on the chip. Machine.SPI on the STM32 uses the same base code. So, either one will do things using DMA.

If you could test via a system actuating the script via UART then this would narrow it down a lot of us.

I’m guessing that since the heap has been moved around that SPI DMA could be broken.

If I work with pre recorded .bin and launch the functions via script rather than callbacks and from there populate SDRAM etc no issues.

This implies most of the code is fine. So, if we remove the SPI interface and use something like the UART one which does not use DMA then I think we have the smoking gun.

I tried with virtual usb and the code executes without crashing Only difference is the camera is capturing frames which are not rappresentative of in device operation.

This means that some “if” conditions won’t be entered but primarily just setting zeros and ones in arrays.

I cannot swap out spi for uart in my device , not without reworking my PCBs.

Please let me know how else I can support.

Did some more testing with SPI and latest released build (old rpc.py):
Manage to catch this after which it crahes without displaying an error:
FRAME 156 57 24
FRAME 157 57 24
FRAME 158 57 24
FRAME 159 57 24
FRAME 160 57 2 ← Abnormal and during recording all frames looked good.

And this
FRAME 67 57 24
FRAME 68 524 ← some frames hereafter skipped most likely because I put a try except around SDRAM read.
FRAME 73 57 24
FRAME 74 57 24
FRAME 75 57 24
FRAME 76 57 24
FRAME 77 57 24
FRAME 78 57 24
FRAME 79 57 24
FRAME 80 57 24
FRAME 81 57 24
Traceback (most recent call last):
File “”, line 53, in
File “rpc.py”, line 367, in loop
File “SPI.py”, line 62, in spi_start_recognition
File “DETECT.py”, line 511, in record_and_detect
File “DETECT.py”, line 67, in extract_stable_frames AttributeError: ‘int’ object has no attribute ‘stdev’ OpenMV v4.5.8; MicroPython v1.23.0-r6; OPENMV4P-STM32H743 Type “help()” for more information. >>>
Where line 67 is median calc.

median_ROI1 = current_frame.get_statistics(roi=c.ROI1).median()

​​

Got this one too after many tries (950 Frames expected to be processed) I thought the proessing would be in the blocking callback. Dose rpc run anyway in the background?:
FRAME 124 57 24
FRAME 125 57 24
FRAME 126 57 24
FRAME 127 57 24
FRAME 128 57 24

Traceback (most recent call last):
File “main.py”, line 53, in
File “rpc.py”, line 363, in loop
File “rpc.py”, line 296, in __get_command
File “rpc.py”, line 93, in _get_packet
File “rpc.py”, line 597, in get_bytes
OpenMV v4.5.8; MicroPython v1.23.0-r6; OPENMV4P-STM32H743
Type “help()” for more information.

I also tried making sure no calls from master during execution of callback and SOMETIMES doesn’t crash in that loop. Those times crashes later with random erros on ml. of the type

File “ml/preprocessing.py”, line 55, in call
TypeError: Can’t convert CSeq: %d

File “ml/model.py”, line 17, in
File “ml/preprocessing.py”, line 30, in call
ValueError: Expected an image input
%s to type

Hi Joe,

Given you say the code works fine except when using RPC to make things happen, this leads me to think there’s an issue with the RPC library.

Then, the RPC library is python, it can’t crash the board, however, the drivers it uses can. I’ve know the SPI module before to crash things when using DMA. So, if possible, could you remove your board from the system you are working with it in, then connect it via an Arduino using the UART version to the RPC bus to test. If the system is fine in this case then it’s 100% an issue with the SPI driver.

Hi

I tried with the virtual USB. Used my PC to send commands.

Doesn’t crash.

I call one function through USB, the one that crashes, and then another that gets the result from the first through a global variable.

I think that proves it’s the SPI. Does it?

I have some H7 boards I could hook up to an Arduino but I’d like to avoid if possible. In any case wouldn’t be able to use the sensor in my device that way .

Sorry if I wasn’t clear. Using rpc USB I call the same callbacks as I would with spi rpc and it works with no issues.

Okay, then it’s probably SPI DMA and where the heap stores memory allocations. The heap used to all be in one place on-chip, but now it’s been moved to different areas and some of it’s off-chip.

The fix for this may be simple, just to disable DMA if the memory buffer is in the wrong location. Let me generate a firmware for you really quick.

Please confirm which board you are using before I do this.

H7 Plus. Thanks.