Image processing performance improvements

I am performing a handful of operations to simplify the frame and then get blobs that appear to be a ball. I iterate through blobs later in the loop, but it appears the majority of the performance is impacted by the image processing portion. Here are the operations I am performing, currently at QVGA resolution. Ideally, I’d like to increase to VGA, but it drops my frame rate significantly. I believe the close operation is most taxing - are there any thoughts on how I can increase performance?

QVGA gets me about 25 FPS on the RT1062.

        img2.replace(img) # Save copy of frame for processing, the original is displayed on lcd
        img2.difference(extra_fb) # Get frame difference (from a copy made earlier)
        img2.binary([(binaryThresholdLow,binaryThresholdHigh)]) # Convert to binary, (25,255)
        img2.close(closeImageValue,threshold=closeImageThreshold) # Value is 2, threshold is 6

Yeah, I have some PRs open on the main repo right now that will 4x the performance of difference. Close/open though have not been improved in performance. Same for binary. There’s not much that can speed those up.

The biggest culprit is the lack of DMA offload by the camera sensor driver. I will be able to start doing work on that later once another PR is merged for sensor driver features. This will massive improve the VGA resolution performance.

At the moment though it won’t be any faster for a while. The H7 Plus has an optimized camera driver though. So, you should see a great deal more speed on that platform in the mean time.

I’m able to get upwards of 35 fps on HVGA with everything except for the close function. Do you have any suggestions for a more performant alternative to close an image? Maybe taking a step back to what I am trying to do - I have a greyscale image of ball in flight, I see that it is in flight by using frame differencing and converting to binary for easy blob detection. The ball has some texture so sometimes it is broken into multiple blobs. I’ve opted not to use merging of blobs because it can result in the ball being merged with a moving person. The close feature has been reliable but now that I have stepped up from QVGA to HVGA (and ideally eventually to VGA), the frame rate is limiting.

The proper way to do this would be through something called the cam shift operation. However, we don’t support this OpenCV method just yet.

So, what are you tracking exactly? Blobs using find_blobs(). What I’m getting at is that it’s no cost to merge detections in software given their blob attributes. E.g. can you do something with a more noisy list of detections than what it would be after using close().

We will eventually get all of this code more optimized over the year. But, right now it’s not as fast as it can be. That said, let me give you a binary with the new line ops enabled. Difference() as I mentioned is 4x faster. This might help. Close() uses difference internally so it’s being called twice.

Here’s an example where merged blobs are an issue. Is there a good way to get selective about which blobs to merge? I know there is a minimum size but I’d want to selectively filter out large blobs, like a person.

firmware.zip (1.3 MB)

Hi, here’s a firmare with the new lineops PR here: modules/py_image: Optimize and cleanup all math and binary line ops. by kwagyeman · Pull Request #2061 · openmv/openmv (github.com) compiled for the RT1062.

You should see a 4x speedup with difference(). The PR switches support to use cortex-simd.

As for dealing with the blobs in that image.

This is simple. Ignore blobs that are too big. You can still use the merge=True argument by the way with find_blobs(). Just add the callback to a python method to look at the blob and then filter it if it’s too large. See the callback arg that find_blobs() supports.

Is there a way to compile a firmware that includes the changes in Image processing performance improvements - #6 by kwagyeman

I added a callback that appears to be working, but perhaps too well for my use case (see below). Without any close operation, my head (and even beard) ends up being separated into it’s own smaller blob, and since it is now under the threshold it never is joined with the rest of me to be filtered out. I suspect I can dial in my binary thresholds but I worry that this might be prone to errors as different people/clothing are the subject. Any thoughts on this?

def area_filter(blob):
    return minBallArea <= blob.area() <= maxBallArea
...
for blob in img2.find_blobs(
    [thresholds],pixels_threshold=minBallPixels, area_threshold=minBallArea, merge=True, margin=2, x_stride=stride, y_stride=stride, threshold_cb=area_filter
):

Is this is valid statement?

return minBallArea <= blob.area() <= maxBallArea

I think you need to do:

return (minBallArea <= blob.area()) and (blob.area() <= maxBallArea)

Yeah, you just follow the guide on the github for how to build the firmware and then checkout the branch and compile.