Image copy between extra framebuffers regardless of pixelformat or framesize

I am doing an image processing, on H7 Plus, that requires one or two extra frame buffers (EFBs), and I need to deep copy images from one EFB to another. I am trying to avoid reallocation when changing image pixel format or size, to reuse the same buffer. However, I am unable to find a function to allow me to do this.
I know that it could be risky to deep copy images that have different pixel format or size, but if I am careful to make sure the new image format and size have enough allocated space, it should not be a problem.
Is there a function than can do just that deep copy operation?

import sensor, image

sensor.reset()
sensor.set_pixformat(sensor.RGB565)
sensor.set_framesize(sensor.FHD)
sensor.skip_frames(time = 2000)

dumpImageBytes = True

def DumpFewImageBytes(info, img):
    global dumpImageBytes
    if not dumpImageBytes:
        return
    print(info, img.bytearray()[0], img.bytearray()[1], img.bytearray()[2], img.bytearray()[3])

copyImg = sensor.alloc_extra_fb(sensor.width(), sensor.height(), sensor.RGB565)
copyImg2 = sensor.alloc_extra_fb(sensor.width(), sensor.height(), sensor.RGB565)

def CopyFBImage(dest, src):
    #dest = src.copy(copy_to_fb=True)   #always uses original snapshot FB, to copy to, and returns nothing (destination unchanged)
    #dest = src.copy()                  #not enough space
    dest.replace(src)                  #draw in destination format
    #dest.set(src)                      #draw in destination format
    #dest.assign(src)                   #draw in destination format
    #dest.blend(src)                    #draw in destination format
    #dest.draw_image(src)               #draw in destination format

#Test1 - copy from one extra FB to another extra FB, same pixel format
print("Test1 snapshot")
img = sensor.snapshot()
copyImg.clear()
copyImg2.clear()
DumpFewImageBytes("img", img)
DumpFewImageBytes("copyImg", copyImg)
DumpFewImageBytes("copyImg2", copyImg2)
print("EFB -> EFB2")
CopyFBImage(copyImg, img)
img.clear()
CopyFBImage(copyImg2, copyImg)
DumpFewImageBytes("img", img)
DumpFewImageBytes("copyImg", copyImg)
DumpFewImageBytes("copyImg2", copyImg2)

#Test2 - copy from default FB to extra FB, different pixel format
print("Test2 snapshot")
img = sensor.snapshot()
copyImg.clear()
copyImg2.clear()
DumpFewImageBytes("img", img)
DumpFewImageBytes("copyImg", copyImg)
DumpFewImageBytes("copyImg2", copyImg2)
print("FB (GRAYSCALE) -> EFB (RGB565)")
img.to_grayscale()
CopyFBImage(copyImg, img)
DumpFewImageBytes("img", img)
DumpFewImageBytes("copyImg", copyImg)

draw_image basically does a deep copy and translates it into the destination buffer format. replace()/set() are just aliases in the latest firmware for it.

Yes, replace transform buffer values based on destination image format, and that take more time, while a simple deep copy, including the image format, would be faster, since it does no transformation.

Mmm, you basically want to use the image buffer as a bytearray().

I have to think about this. It’s not like something you’d generally want to do.

Can you explain exactly what you want to happen step by step? I may be able to tell you how to do that as fast as possible.

For two extra buffers case, it is the simple difference between two consecutive frames.

    originalImg.replace(img)
    #subtract previous frame
    img.difference(previousImg)
    previousImg.replace(originalImg)

Would DMA work with FMC, in Openmv FW to make memory copy process faster?

No, the processor already hits the max bandwidth of the memory bus. You just need to do less work.

Maybe try this:

fb0 = sensor.alloc_extra_fb(sensor.width(), sensor.height(), sensor.RGB565)
fb1 = sensor.alloc_extra_fb(sensor.width(), sensor.height(), sensor.RGB565)

i = 0
fbs = [fb0, fb1]
fbs[i].replace(sensor.snapshot())
i = i ^ 1

while(True):
    img = sensor.snapshot()
    fbs[i].replace(img)
    i = i ^ 1
    img.difference(fb[i])

This copies the new image to a buffer to be used in the next loop and uses the older buffer to difference against.

Your solution is good for observation purpose, in IDE, since we end up with the resulted image in the buffer the IDE fetches.
In our final solution we just process the extra buffer directly with simple

    img = sensor.snapshot()
    tmpImg.difference(img)
    #process tmpImg further here
    tmpImg.replace(img)

The problem still remains, that we cannot do too many simple operations, since that limits the resolution to QVGA or under to maintain at least 15fps for the processing. We reached almost 14fps with VGA and acquiring grayscale directly and tracing movement direction of a single largest object. And half of the time or more, is spent on simple operations like move, add, subtract. After turning to binary, in the processing end, it gets faster.

Do you have the latest firmware? We optimized a lot of these ops recently. If you check under the About Tab on the website and see the performance benchmark link you can see how long each of these ops take to run per camera model. I’ll post an expect performance number in a bit. Not in the office.

I get 40 FPS with RGB565 VGA with this on the H7 Plus:

# Untitled - By: kwagy - Tue Sep 10 2024

import sensor, image, time

sensor.reset()
sensor.set_pixformat(sensor.RGB565)
sensor.set_framesize(sensor.VGA)
sensor.skip_frames(time = 2000)

clock = time.clock()

fb0 = sensor.alloc_extra_fb(sensor.width(), sensor.height(), sensor.RGB565)
fb1 = sensor.alloc_extra_fb(sensor.width(), sensor.height(), sensor.RGB565)

i = 0
fbs = [fb0, fb1]
fbs[i].replace(sensor.snapshot())
i = i ^ 1

while(True):
    clock.tick()
    img = sensor.snapshot()
    start = time.ticks_us()
    fbs[i].replace(img)
    i = i ^ 1
    img.difference(fbs[i])
    print(clock.fps(), time.ticks_diff(time.ticks_us(), start))

40.3264 19640

Looks like the overhead of the processing is 20ms.

I run the code on latest sources to confirm the 40 fps, and I compared with the fps reported in v 4.5.1 that came along with the device, which reported 25 fps. Very good time improvement! Thank you, to all people involved.
But I already used the master to get the results from previous post. I updated recently from tag 4.5.8 to tag 4.5.9, but 4.5.9 keep resetting the device, even without our local changes included, so I got the master, with v4.6.0, and this version seems to be faster, however, in order to filter some noise on zoomed views, we added a few extra lines of code and switched from VGA to HVGA, and the the time changed from our previously recorded times.

Cool! Good to hear it’s working for you! (I think?)

Yeah, I went through optimizing these algorithms earlier this year. I’ll have changes vectorizing them for the M55 soon too which will max them out too when we start releasing boards with the new MCU core. We’re seeing great improvements of 2x and 4x performance speed ups already in testing.