erode/dilate time consuming

Hello,

I was amazed by how quickly and easily a little OpenMV cam find blobs, so that I decided to see if I can put those cams into my project.

I try to find dimensions on that kind of object:
stream-c1.png
After finding blobs, I want to erode(4) and dilate(4) to remove the white wire. But that quickly slow down the process. On the following code that is the computing time I measured:

  • Convert to binary: 1.73ms
  • Find blob (in grayscale): 1.04ms
  • Crop (from 34808px to 18696px): 0.82ms
  • img.erode(4): 9.59ms
  • img.dilate(4): 9.53ms
  • Find blob (in binary): 0.94ms
  • Mesure: 0.76ms

A snapshot takes the camera 13.8ms. Then the total cycle is about 38.2ms and half is dedicated to the erode/dilate process.
In other terms:

  • without erode/dilate=52 FPS
  • with erode/dilate=26 FPS

Is there something I can do to filter my image at higher speed? Or erode/dilate is slow by its nature.

import sensor, image, time, math

streamname = "stream-c1.bin"
stream = image.ImageIO(streamname, "r")
px_threshold = 1641
px_Stride = 23
thresholdGS = [(40, 255)]
thresholdBIN = [(1,1)]
opening_px = 4
total_compute_time = 0
n = 1

while(True):
    img = stream.read(copy_to_fb=True, loop=True, pause=True)
    aftersnapshottime = time.ticks_us()
    # Binary image, find the blob, crop and image opening
    img.binary(thresholdGS,to_bitmap=True)
    for blob in img.find_blobs(thresholdBIN, pixels_threshold=px_threshold, x_stride=px_Stride, y_stride=px_Stride, merge=True):
        blobx, bloby, blobw, blobh = blob.rect()
        blobroi = (blobx,bloby,blobw,blobh)
        break
    img = img.crop(roi=blobroi)
    img.erode(opening_px)
    img.dilate(opening_px)
    # Measure blob axis
    for blob in img.find_blobs(thresholdBIN, pixels_threshold=px_threshold, x_stride=px_Stride, y_stride=px_Stride, merge=True):
        maja = blob.major_axis_line()
        lmaja = round(math.sqrt((maja[2]-maja[0])**2 + (maja[3]-maja[1])**2), 1)
        mina = blob.minor_axis_line()
        lmina = round(math.sqrt((mina[2]-mina[0])**2 + (mina[3]-mina[1])**2), 1)
        print("BlobGS |" + " P:" + str(blob.pixels()) + " L:" + str(lmaja) + " l:" + str(lmina))
        break
    total_compute_time = total_compute_time + time.ticks_diff(time.ticks_us(), aftersnapshottime)
    average_ellapsed_compute_time = total_compute_time / n
    print("CoMS: " + str(round(average_ellapsed_compute_time/1000,2)))
    n = n + 1

stream-c1.zip (746 KB)

Erode and dilate of 4 is a 9x9 convolution on the image. It’s really expensive.

With the triple buffering fix coming later this month (or the next) this should double your FPS. If you can wait on me.

Please reduce the erode and dilate size and try out the threshold argument. This is a feature OpenCV doesn’t have but it’s quite nice for getting the result you want with a smaller kernel size.

OK.
I will wait then for the triple buffering to see if that fit my needs.
Nevertheless erode/dilate(4) seems not the efficient way to do the job. But for the moment it is the only way I found to remove the wire from the image.

Use the threshold argument for these methods. It will do what you want.

Also, don’t convert he image to a bitmap. This is harder for the processor to work on. Even though the image is smaller it has to pack and unpack bits. Leave the image as a grayscale image. This is the fastest.

E.g. remove “to_bitmap=True”

Indeed. It was counter-intuitive to me as I don’t know what is behind the hood. Here the differences:

  • Convert to binary: -0.18ms
  • Find blob (in grayscale): +0.08ms
  • Crop (from 34808px to 18696px): -0.49ms
  • img.erode(4): -1.83ms
  • img.dilate(4): -1.95ms
  • Find blob (in binary): +0.05ms
  • Mesure: +0.01ms

I tried the threshold argument for erode, nothing happened until I set it to 50 minimum. And as I understand the threshold argument is more to only target small isolated blobs, not to remove hairy spikes of a blob.

import sensor, image, time, math

streamname = "stream-c1.bin"
stream = image.ImageIO(streamname, "r")
px_threshold = 1641
px_Stride = 23
thresholdGS = [(40, 255)]
thresholdBIN = [(128,255)]
opening_px = 4
total_compute_time = 0
n = 1

while(True):
    img = stream.read(copy_to_fb=True, loop=True, pause=True)
    aftersnapshottime = time.ticks_us()
    # Binary image, find the blob, crop and image opening
    img.binary(thresholdGS)
    for blob in img.find_blobs(thresholdBIN, pixels_threshold=px_threshold, x_stride=px_Stride, y_stride=px_Stride, merge=True):
        blobx, bloby, blobw, blobh = blob.rect()
        blobroi = (blobx,bloby,blobw,blobh)
        break
    img = img.crop(roi=blobroi)
    img.erode(opening_px)
    img.dilate(opening_px)
    # Measure blob axis
    for blob in img.find_blobs(thresholdBIN, pixels_threshold=px_threshold, x_stride=px_Stride, y_stride=px_Stride, merge=True):
        maja = blob.major_axis_line()
        lmaja = round(math.sqrt((maja[2]-maja[0])**2 + (maja[3]-maja[1])**2), 1)
        mina = blob.minor_axis_line()
        lmina = round(math.sqrt((mina[2]-mina[0])**2 + (mina[3]-mina[1])**2), 1)
        print("BlobGS |" + " P:" + str(blob.pixels()) + " L:" + str(lmaja) + " l:" + str(lmina))
        break
    total_compute_time = total_compute_time + time.ticks_diff(time.ticks_us(), aftersnapshottime)
    average_ellapsed_compute_time = total_compute_time / n
    print("CoMS: " + str(round(average_ellapsed_compute_time/1000,2)))
    n = n + 1

It will remove hairy spikes too.

I’m confused about what threshold really meant to do. And there is no improvement in the computing time.

Here my test code:

import image, time
thresholdBIN = [(128,255)]
img = image.Image("/Disk-Spikes-r00a-320x240-dots.pgm",copy_to_fb=True)
img.binary(thresholdBIN)
aftersnapshottime = time.ticks_us()
img.erode(1,threshold=7)
print("Time: " + str(round(time.ticks_diff(time.ticks_us(), aftersnapshottime)/1000,2)))
img.flush()
time.sleep_ms(500)
img.save("/Disk-Spikes-r00a-320x240-dots-erode_tests.pgm")

Original file:
Disk-Spikes-r00a-320x240-dots.png
img.erode(1) - OK understand:
Disk-Spikes-r00a-320x240-dots-erode-1-threshold-8.png
img.erode(1,threshold=0) - the smaller dot is not removed but the second smallest is smaller?:
Disk-Spikes-r00a-320x240-dots-erode-2-threshold-0.png
img.erode(1,threshold=1) - OK understand:
Disk-Spikes-r00a-320x240-dots-erode-2-threshold-1.png
img.erode(1,threshold=2):
Disk-Spikes-r00a-320x240-dots-erode-2-threshold-1.png
img.erode(1,threshold=3):
Disk-Spikes-r00a-320x240-dots-erode-1-threshold-3.png
img.erode(1,threshold=4):
Disk-Spikes-r00a-320x240-dots-erode-1-threshold-5.png
img.erode(1,threshold=5):
Disk-Spikes-r00a-320x240-dots-erode-1-threshold-4.png
img.erode(1,threshold=6):
Disk-Spikes-r00a-320x240-dots-erode-1-threshold-6.png
img.erode(1,threshold=7):
Disk-Spikes-r00a-320x240-dots-erode-1-threshold-8.png
img.erode(1,threshold=8):
Disk-Spikes-r00a-320x240-dots-erode-1-threshold-7.png
img.erode(1,threshold=9) - All set to black?:
Disk-Spikes-r00a-320x240-dots-erode-1-threshold-9.png
img.erode(2) - OK understand:
Disk-Spikes-r00a-320x240-dots-erode-2.png
img.erode(2,threshold=0) like img.erode(1,threshold=0) ?:
Disk-Spikes-r00a-320x240-dots-erode-2-threshold-0.png
img.erode(2,threshold=1) and further threshold ?:
Disk-Spikes-r00a-320x240-dots-erode-2-threshold-1.png

So, erode and dilate both do this…

They look at a region of NxN pixels where N is ((arg*2)-1).

They sum the number of pixels that are not black in that region.

Then erode sets the pixel to black unless the sum is above threshold.

And dilate sets the pixel to white if the sum is below threshold.

The threshold argument allows you to tune how many neighbors per window need to be set/clear.

This is powerful because this allows you to make erode only function if any neighbor is not set.

Thanx for that clarification.

But I think threshold is not useful for my case.

Best result to remove spikes is to set the threshold to the default value (2*arg+1)^2-1 .
And the threshold doesn’t seem to have any impact on the computation time. During the convolution imlib_erode_dilate() method go anyway through all the image to be able to compare to the threshold and IMAGE_PUT_GRAYSCALE_PIXEL_FAST(buf_row_ptr, x, COLOR_GRAYSCALE_BINARY_MIN) costs nothing compare to that.

What I find strange indeed is that erode(4) is not so expensive compared to erode(1). On a 320x240 image erode(4) costs 22.8ms and erode(1) is 9.7ms but the kernel is 81pixels in the first case and 9 in the second one.

Yeah, we only load/unload the columns that change. This saves a lot of work. E.g. since only the first/last column change we only are updating the sum by 18 pixels versus 6 as the kernel slides across the image.

As I’m only working in triggered mode, I won’t see any difference with the triple buffering. Is that right?

Yeah, it won’t because you’d want to control the frame capture.