4.5.8 Memory Leaks and abnormal behavior (on custom hardware with identical H7 plus components )

firmware.zip (1.1 MB)

Please give this firmware a try.

Loaded it up, no SPI working as usual and then put 4.5.2 rpc.py (last working for me).

Whatever you did definately fixes the crashing. Out of 10 of my device runs crashed once but I didn’t catch where so ignore for now.

However, getting unexpected final results back over the SPI to the master. Symptoms would be coherent with gettting much less fps than normal (factor 10). Maybe has something to do with the dirty build. Will check tomorrow morning. (EU time zone)
OpenMV v4.5.8-43.g0ad4d34a.dirty; MicroPython v1.22-omv.r17.434.g8f6a976de.dirty; OPENMV4P-STM32H743

p.s. using global shutter HQQQVGA 2700ms 57x24 pixel fixed exposure which typically gets me above 300fps.

The issue is this code:

micropython/ports/stm32/spi.c at 8f6a976deacc38c49830373091de14c7e2aed223 Ā· openmv/micropython (github.com)

The SPI module deletes MPU regions we have configured versus just doing normal cache maintenance. Switching these out to Cache Invalidates fixes things… however, there’s no really good solution for you in using DMA at all. I should just disable it entirely. It’s really not safe to use DMA transfers to random alloced byte arrays in MicroPython. Bytearrays need to be address aligned to 32-bytes, and 32-bytes in length to not conflict with cache lines. Also, the DMA module doesn’t have the deepest buffers so it’s not clear if it can handle SDRAM latency too which could cause it to stall.

firmware.zip (1.1 MB)

Try out this firmware, I disabled DMA entirely. It should not crash and not have any possibility of data corruption.

Thanks, I think this indeed solves the original issue but broke something else. Either that or something else is broken in the source code you built on.

I am taking snapshots and saving to SDRAM 950 times. Before this build the function would take approx 3 seconds to execute. Now taking 17 seconds.

Can you confirm this build is for H7 PLUS (not Pro) and the SDRAM read() write() calls didn’t change?

Yes, the build is for the H7 plus.

Question, are you telling your camera via RPC to take a picture each time? Or are you transmitting the data via RPC? Removing DMA mode would have slowed down the SPI bus speed considerably.

No. The callback invokes a function that takes 950 snapshots each time recording to sdram.

Had nothing to do with spi after called.

Also never had this issue. Has something changed in sdram writing?

No, SDRAM speed remains unchanged.

We haven’t touched the image writer class in a while. Can you add some timing code and see what’s taking all the time?

Time in [ms]
Time to init sweep_stream 0
Average Snap Time 18.8558
Average Save Time 0.0157895
Max Snap Time 19
Max Save Time 1
Time to sweep_stream.seek(0) 0
Good frames 950 Bad Frames 0 ← Using try and except. Except increases bad frames.
Total time from call to return 17931

So for some reason Snapshots are extreamly slow now. I also tried removing try… except … which have been there for years but same result. This has been working for 3 years now with my setup.

Global Shutter
Sensor Init:

    sensor.reset()
    sensor.set_pixformat(sensor.GRAYSCALE)
    sensor.set_framesize(sensor.HQQQVGA) #resolution
    sensor.set_windowing([3,13,57,24])
    sensor.set_auto_exposure(False, exposure_us = 2700)
    sensor.skip_frames(100) #Wait for sensor to init

Build 4.5.8 - Same identical setup as above otherwise.
Time to init sweep_stream 0
Average Snap Time 2.67263
Average Save Time 0.0115789
Max Snap Time 3
Max Save Time 1
Time to sweep_stream.seek(0) 0
Good frames 950 Bad Frames 0
Total time from call to return 2557

You may need to explicitly set the number of frame buffers before calling snapshot().

The smoking gun is to print the result of getframesbuffers count, or whatever it is named. If it’s not 3 then the system is dropping frames. Note, each set sensor call recalculates it so get the value right before you call snapshot.

Since we’ve been changing around how much RAM is is dedicated for the frame buffer automatic algorithm may not be sufficient for you anymore. It only chooses 3 frame buffers if half the free fb alloc ram is left after doing so.

Note, given the push for more heap for ML we will reserve probably 16MB out of the 32MB for the heap. Meaning that the frame buffer size will be halved.

Also, since you are using the RAM based frame recorder that’s also taking frame buffer space. 950 small frames is likely using most of the RAM on the system. This plus the 4MB heap currently probably pushed you out of triple buffer mode by default. You can force it by setting the frame buffer count to 3 before calling snapshot.

Finally, if doing this fixes things, can you confirm which of the two firmwares I provided fixes you issues? The non DMA one or the DMA one with cache ops (this is the first firmware I posted). There’s definitely a bug we need to fix here.

…

When we change the heap size to be 16MB I will make sure to add a flag to the image writer so that you can alloc it on the heap instead of the frame buffer so you can you still do what you are doing as otherwise you’d have a lot less RAM available.

Last firmware you sent where SPI works ->OpenMV v4.5.8-43.g0ad4d34a.dirty; MicroPython v1.22-omv.r17.434.g8f6a976de.dirty; OPENMV4P-STM32H743

Frame buffer count: 1 (WITHOUT SETTING FB counts)
Average Snap Time 18.8505
Average Save Time 0.0147368
Max Snap Time 19
Max Save Time 1

sensor.set_framebuffers(3)
Frame buffer count: 3
Average Snap Time 5.13053
Average Save Time 0.0126316
Max Snap Time 12
Max Save Time 1

sensor.set_framebuffers(5)
Frame buffer count: 5
Average Snap Time 3.43579
Average Save Time 0.00947368
Max Snap Time 13
Max Save Time 1

Can’t get the performance of build 4.5.8 and although faster some frames are corrupted I think cause crashing later on.

I think you solved the SPI issue and you said it was a bug to be fixed. This second issue , a breaking change for me between 4.5.8 and whatever you have in development, do you confirm you understand the reason based on the results above and will make a provision for a flag to allocate to the heap as it always was till 4.5.8? Setting framebuffer counts doesn’t work for me.

Frame buffer count: 10
Average Snap Time 2.68737
Average Save Time 0.0126316
Max Snap Time 12
Max Save Time 1

Frame buffer count: 6
Average Snap Time 2.68211
Average Save Time 0.0178947
Max Snap Time 12
Max Save Time 1

Increasing FB I get back to proper FPS but proc is crashing either without error
when performing: get_statistics().median()
AttributeError: ā€˜int’ object has no attribute ā€˜median’

This was also one of the failures I was getting before you disabled DMA

Sorry, for the delay, I’m traveling this weekend.

If you want the maximum performance and you are already planning on allocating a huge array to storage images in RAM then you will need to set the frame buffer count to 3. The automatic algorithm only picks that value if it sees you have a lot of free frame buffer space. You don’t need to make it any higher than that. Making it higher turns it into a fifo buffer versus triple buffering.

Okay, so, DMA was definitely crashing things when enabled. We will debug when back at the office.

…

So, that error you are getting implies memory corruption.

Can you deduce what may be causing that new error? You mentioned already it was stable without SPI access. Does it not have issues if triggered via VCP?

Configuration (Never changing my core code project) Crashing with SPI callbacks Crashing with USB callbacks FPS ok (2.6ms snap) SPI works
no DMA 4.5.9 Build + rpc.py (v_4.5.2) + frame_buffers (3) No N/A No - 5ms avg Yes
no DMA 4.5.9 Build + rpc.py (v_4.5.2) + frame_buffers (6) Yes Yes Yes Yes
no DMA 4.5.9 Build + rpc.py (v_4.5.2) + frame_buffers (1) No N/A No - 18.8 ms Yes
no DMA 4.5.9 Build + NO rpc.py (v_4.5.2) + frame_buffers (3) N/A N/A N/A No
4.5.8 Build + rpc.py (v-4.5.2) (No forcing frame_buffers. Auto set to 3) Yes No Yes Yes
4.5.6 Build + NO rpc.py (v-4.5.2) No N/A Yes No
4.5.6 Build + rpc.py (v-4.5.2) No N/A Yes Yes
4.5.2 Build No N/A Yes Yes

I’d really like to get back to the conditions I had with 4.5.6 Build + rpc.py (v-4.5.2), preferably also addressing the breaking changs with 4.5.3 that have fored me to use rpc.py from the prior version. Please let me know what I can do to asisst. I’d hate to have to stay back with v_4.5.2 for production of my product.I’d like my product to evolve with yours and continue this mutual exchange.

1 Like

Thanks for the table. Regarding the frame speed, I need a test script and a goal value. Please post the most minimal script for this.

It looks like the SPI error and the speed of the system are unrelated. So, if you can just give me a script I can run that should be faster I can figure out what went wrong.

As for needing to use an old version of the RPC code, I’m not quite sure why you need to do that. We only moved things to use the machine module. The SPI being buggy is unrelated to the Python code.

There is a misunderstanding of the table.
The SPI working vs non working (SPI works) is regarding a seperate bug for which we corrisponded in another post which you said you would look into. Upgrade to v4.5.3 - rpc spi no longer working
Let’s keep this seperate for now as I am not blocked.

The no dma build fixes the SPI leading to my script crashing, except for when I increase frame buffers above 3. Irrelevant because 3 is the correct setting.

I’ve always had above 300 fps taking snapshots and writing to sdram. That is the goal. (See table fps ok = 2.6 Ms time to snapshot and save to sdram)

I presume running your example script to write to sdram is enough to reproduce the snapshots being delayed with 4.5.9. I’ll check and get back to you.

Code to reproduce regression in FPS. Your example with modified sensor parameters.
Run this with
build 4.5.8 and prior and you get — >372 fps (MY TARGET fps which I have been getting for a while now. My entire appplication is designed around this give or take 20fps)
With 4.5.9 Dirty Build without DMA ----> 194.847 fps
With 4.5.9 Dirty Build without DMA and without set_framebuffers(3) line ---->53 FPSfps


# This work is licensed under the MIT license.
# Copyright (c) 2013-2023 OpenMV LLC. All rights reserved.
# https://github.com/openmv/openmv/blob/master/LICENSE
#
# Image Memory Stream I/O Example
#
# This example shows how to use the ImageIO stream to record frames in memory and play them back.
# Note: While this should work on any board, the board should have an SDRAM to be of any use.
import sensor
import image
import time

# Number of frames to pre-allocate and record
N_FRAMES = 950

sensor.reset()
sensor.set_pixformat(sensor.GRAYSCALE)
sensor.set_framesize(sensor.HQQQVGA) #resolution
sensor.set_windowing([3,13,57,24])
sensor.set_auto_exposure(False, exposure_us = 2700)
sensor.set_framebuffers(3)
sensor.skip_frames(100)

clock = time.clock()

# Write to memory stream
stream = image.ImageIO((57, 24, sensor.GRAYSCALE), N_FRAMES)

for i in range(0, N_FRAMES):
    clock.tick()
    stream.write(sensor.snapshot())
    print(clock.fps())

while True:
    # Rewind stream and play back
    stream.seek(0)
    for i in range(0, N_FRAMES):
        print("Playing Frame" , i)
        img = stream.read(copy_to_fb=False, pause=False)
        # Do machine vision algorithms on the image here.