Behavior of Referenced Memory & sensor.dealloc_extra_fb()

I’m trying to do flat field correction on the H7+

This entails converting the flat-field image to a ndarray, for which I’m allocating a frame buffer like so:

ff = sensor.snapshot.to_grayscale()

ff_buffer = sensor.alloc_extra_fb(
            sensor.width()*4,
            sensor.height(),
            sensor.GRAYSCALE
        ).bytearray()

np_ff = ff.to_ndarray('f', buffer=ff_buffer)

Once it’s captured I keep it in memory for future use. However, I’d like to re-write some of my other buffer allocations to use custom context managers, so that I can streamline some of my image processing code. For example:

class GrayScaleFloatBuffer():

    def __enter__(self):
        buffer = sensor.alloc_extra_fb(
            sensor.width()*4,
            sensor.height(),
            sensor.GRAYSCALE
        ).bytearray()

        return buffer

    def __exit__(self, exc_type, exc_val, exc_tb):
        sensor.dealloc_extra_fb()

Which had me thinking about the consequences of calling sensor.dealloc_extra_fb() when the frame buffer in question still contains a referenced variable. It would make the flat field code a lot more succinct:

ff = sensor.snapshot.to_grayscale()

with GrayScaleFloatBuffer() as ff_buffer:
    np_ff = ff.to_ndarray('f', buffer=ff_buffer)
    # math ...

… but it would just delete the ndarray, np_ff, once the context exited, right?

Hi, we increased the heap of the H7 plus to 8MB. It will probably increase from that even more.

There’s no need to use the extra frame buffers now as RAM regions. You can just allocate the numpy ndarray yourself and use it directly.

Additionally, you can cast ndarray’s into images and back again with the latest Image() API.


To your question, yes, to_ndarray(buffer) does a shallow conversion. So, np_ff reference deallocated RAM.

Sorry, I think I oversimplified my code too much. This is done in the context of a class, and ultimately np_ff gets assigned to self.ff such that it would live beyond the context. Not sure what happens when dealloc_extra_fb is called - if self.ff becomes a landmine or not.

As for the first section of your response - I was using a frame buffer because the ndarray is effectively uncompressed and a lot bigger than the original image (too big for the heap). But where is the documentation for casting ndarray’s to/from images? Is this different from to_ndarray?

You can use the Image() constructor on ndarrays to create images from them. Then to_ndarray() creates an ndarray from an image. The buffer argument allows using another memory region as the place to hold the data.

Yeah, when you dealloc_extra_fb it would allow that memory address to be re-used by the next alloc_extra_fb. Since there’s no MMU you won’t get a segfault trying to access this RAM. However, it’s not safe though. Generally, I’d discourage you from using the extra FB stuff moving forwards. We will most likely deprecate it eventually, as the heap is much larger now.

Having access to the FB is useful for doing image processing at higher resolutions; I’m working with WXGA. The flat field image is in grayscale (0.968 MB), and converting it to a float ndarray for the sake of division multiplies its footprint by 4 (so it becomes 3.871 MB).

The issue is that I must then multiply the current image, which is RGB, by the inverse of the grayscale ndarray. The current RGB image is 1.935 MB, but converting that to an ndarray multiplies its size in memory by 6 (5.8 MB) which more than exhausts the heap. There’s enough memory in the buffer to circumvent this, so I figured I’d use it.

I could just use a smaller resolution, but I’m trying to push it as far as I can at the moment. I had also tried using uint8 ndarrays before switching to using FB’s, but they still exhausted the heap at the same resolution when using anything bigger than WVGA.

There was an `Image.div() method at one point that (I think) would allow doing this process in the frame buffer with less RAM usage, but it appears to have been deprecated.

I see what you want to do.

Yeah, Image.div() was removed because doing a divide at all in any code path for images is very slow. You have to use reciprocal division. Which is typically a multiply followed by a shift. ARM Helium can do 8 of these in parallel per clock so we see massive speed ups on next gen systems for these types of algorithms. Like up to 4x.

Note, you can build a custom firmware with a larger heap… just fork our repo, enable github actions, and apply this PR: boards: Increase heap sizes on SDRAM boards. by kwagyeman · Pull Request #2367 · openmv/openmv (github.com)

I recommend doing this versus using the extra_fb stuff, because, as mentioned, it’s going to be deprecated most likely in the future.

Question: I haven’t ever thought about flat-field correction. Can you share some literature about this and the use cases… this could be trivially added to the firmware. Not right though as we a have a lot of other stuff on our plate. But, this sounds like a rather easy lineop function to add.

For example, I could bring back Image.div() with reciprocal division. However… I removed it because dividing one image by another generally yields a garbage image. How is this not the case with flat field correction?

Ah, cool! That’s really useful to know.

As for literature, if you google “Flat Field Correction” you’ll find a ton of articles/blog posts from image manipulation products (Adobe, National Instruments, FLIR) singing its praises, but for its technical application I was just going off the formula found on wikipedia: Flat-field correction - Wikipedia

It’s a technique for correcting vignetting (where an image becomes darker the further from the image center you go). It’s really useful for the wide angle lens.

For the OpenMV boards, the “Dark Frame” subtraction isn’t super necessary since the image formats use integers, so there isn’t much low-level noise to remove. I’m basically just doing C = R * (m/F), where m/F is part of a calibration process that gets saved out for later.

You do need a reference Flat Field (calibration) frame, but I think the likes of Photoshop take in a file path to the image you want to correct and a path to its corresponding calibration frame.
You could add an instance method to the Image class like this:

def Image.flat_fielded(ff: image.Image, overwrite: bool = False) -> image.Image
m = ff.get_statistics().mean()
out = self * m / ff # whatever this is in c

Where ff would be the calibration image. It would return a new image if overwrite = False, but write to the buffer used by the image instance if overwrite = True. The option to overwrite would be useful since it may not be necessary to keep the raw image around (in my use case I don’t care about it).

Though matlab has an implementation that somehow estimates the shading in the given image and does not need a reference flat field image :thinking: imflatfield - 2-D image flat-field correction - MATLAB

As for how it’s not garbage, it actually is! If you’re only using integer math. I have experimented with saving out m/F as a grayscale bitmap with quality set to 100, and when multiplying a snapshot by that saved image the product looks terrible. There’s something about using 32bit-float ndarrays that gives much better results even though, when converted to an image, the product is truncated back to uint8.

So this works really well (R and F are images converted to float ndarrays):

int(float(R) * float(m) / float(F))

And this is garbage (R and F are converted to uint8 ndarrays):

R * m / F => int * int / int

You could definitely get a good result using 16-bit floating point numbers internally (32 bit is overkill I think). I see there are float16 declarations throughout the openmv repo, so it may be pretty easy to implement and get decent results out? Furthermore, I think this is equivalent to what I’m doing in the first pseudo-equation up there:

R * int(float(m) / float(F))

I just convert R to a float ndarray so I can have access to in-place multiplication: R *= m/F

Ah, I see how it’s not garbage. You multiply by the mean first, which prevents the results from going to 0.

However, to make this fast you’d have to allocate a reciprocal image first ffr = (65536 / ff) once. This would be 16-bits per pixel. Then you could do out = (in * m * ffr) >> 16 which would be run in your main loop to be fast. Moving the m to the reciprocal image is a possible optimization… but, probably would overflow. So, you’d need two muls in the loop to be safe.

Anyway, I’ll keep this in mind as a nice feature to have. It’s not hard to add, but, it’s not a simple lineop since you have to allocate the reciprocal image first.

1 Like

Ah another error on my part, I am actually doing R * (m/F), but if (R *m) / F is somehow better then hey, good to know :+1:

Saving m/F should be fine. The justification is that the flat field image is an image of a uniformly illuminated surface. Therefore the dark regions within it should be above some threshold to be useful. If there are pixels in the flat field image really close to 0, then it’s not really a “good” flat field image, and the division overflowing is an indication that the user needs a better reference image.

Note, you can build a custom firmware with a larger heap

Coming back to this after a while, I realized this may not be a good solution, right? If you increase the size of the heap to accommodate your own np processing then you’re stealing memory from the frame buffer, which the built-in processing algorithms rely on right? So further segmenting the RAM could lead to other headaches that are harder to solve than just managing buffers.