Python performance

WhyNot · August 25, 2022, 11:36am

Hello. I’m trying to apply OpenMV cam (board H7, sensor OV7725, firmware 4.1.1) for some motion recognition project. I wrote the code and got a 5 seconds for processing a single frame, which is totally not acceptable. I start researching and found that a single multiplication in nested loop (4800 iterations) causes a 1.5 ms delay. For core clock 480 MHz it means about 150 cycles per multiplication. Here is the testing code:

# Python speed test

import pyb, sensor, image, time, cpufreq

sensor.reset()                      # Reset and initialize the sensor.
sensor.set_pixformat(sensor.RGB565) # Set pixel format to RGB565 (or GRAYSCALE)
sensor.set_framesize(sensor.QQQVGA)   # 80x60
sensor.skip_frames(time = 2000)     # Wait for settings take effect.
clock = time.clock()                # Create a clock object to track the FPS.

bzrp = pyb.Pin("P7",pyb.Pin.OUT_PP)
bzrn = pyb.Pin("P8",pyb.Pin.OUT_PP)

image_width = sensor.width()
image_height = sensor.height()
working_size = int(image_width*image_height)

r_buf = list((range(working_size)))
g_buf = list((range(working_size)))
b_buf = list((range(working_size)))

for i in range(working_size):
    r_buf[i] = 0
    g_buf[i] = 0
    b_buf[i] = 0

def bzr_toggle():   #piezo buzzer between P7 and P8
    if bzrp.value() == 0:
        bzrp.high()
        bzrn.low()
    else:
        bzrp.low()
        bzrn.high()


freqs = cpufreq.get_current_frequencies()
print("CPU",freqs[0],"MHz",", working size",working_size,"/ allocated",sensor.get_framebuffers(),"buffers")

cnt = 0
fps_rst = 10

# ------------------------------------------------------------------------
while(True):
    bzr_toggle()    #to track FPS without IDE
    clock.tick()                    # Update the FPS clock.
    #img = sensor.snapshot()         # Take a picture and return the image.
    imgarr = sensor.snapshot().bytearray()  #this works faster than unpacking RGB tuple


    test = 0
    #test image processing, takes 91 ms (11 fps) in good lightning conditions
    for j in range(image_height):       #60
        for i in range(image_width):    #80
            test += i   #adds 6ms
            test += j*3 #adds 8ms
            pixel_index = j*image_width+i
            byte_index = pixel_index<<1
            bt0 = imgarr[byte_index]
            bt1 = imgarr[byte_index+1]
            bt2 = imgarr[byte_index+1] #adds 7 ms, for example
            r_buf[pixel_index] += bt1&0xF8
            g_buf[pixel_index] += ((bt1&7)<<5)|((bt0&0xE0)>>3)
            b_buf[pixel_index] += (bt0&0x1F)<<3


    #track FPS and frame processing time
    cnt += 1
    fps = clock.fps()
    tms = 1000/fps
    time_msg = "{:.2f} fps, {:.2f} ms"
    print(cnt,">",time_msg.format(fps,tms))
    fps_rst -= 1
    if fps_rst == 0:
        clock.reset()
        fps_rst = int(fps)
        if fps_rst < 5: fps_rst = 5

Processing a single 80x60 frame takes 91 ms. Maybe I doing some obvious mistake, because I’m new in python, but still, is this ok? Is there a means for improving performance for 10 times at least?

sencery · August 29, 2022, 10:45am

Hey,
In the part where you check for every frame with a nested for loop:

    for j in range(image_height):       #60
        for i in range(image_width):    #80
            test += i   #adds 6ms
            test += j*3 #adds 8ms
            ...

your time complexity increases from O(N) to O(N^3) (it is O(N) in the beginning because of the while loop above all).

You may think about decreasing the time complexity to have a way faster version of your code. Since you have no line of code between two loops, you can multiply width-height values before the loop and iterate until the multiplication, for instance. This will reduce your complexity to O(N^2), which would be faster than O(N^3).

BehicMV · August 29, 2022, 1:18pm

Hi @sencery,
Your solution is great but not actually working. When you are reducing for loop count 2 to 1, you are increasing the loop count the single for statement which means that total loop count will not change and estimated execution time will stay same.

In the other hand @WhyNot, the code looks fine. This slowness caused by the openmv microcontroller. This device is way slower than a normal computer.

kwagyeman · August 29, 2022, 2:39pm

…

While we allow per pixel access in Python you ARE NOT supposed to do image processing in Python at a per pixel level.

Edit the C firmware if you want to do the above or use our image processing library.

kwagyeman · August 29, 2022, 2:41pm

It looks like you are summing color channels? There’s a method that computes the histogram in our library.

Use it.

Topic		Replies	Views
OpenMv function repeate every 5 minutes OpenMV Boards	4	699	August 6, 2021
Slower than expected FPS OpenMV Boards	19	17641	May 28, 2018
fps is slow when OpenMV boot from file system OpenMV Boards	17	13352	August 9, 2017
img.difference() question OpenMV Boards	2	3954	July 18, 2017
Can i lower the clock frequency of OpenMV cam to reduce the power consumption? OpenMV Boards	1	22	October 6, 2024

Python performance

Related topics