Python performance

Hello. I’m trying to apply OpenMV cam (board H7, sensor OV7725, firmware 4.1.1) for some motion recognition project. I wrote the code and got a 5 seconds for processing a single frame, which is totally not acceptable. I start researching and found that a single multiplication in nested loop (4800 iterations) causes a 1.5 ms delay. For core clock 480 MHz it means about 150 cycles per multiplication. Here is the testing code:

# Python speed test

import pyb, sensor, image, time, cpufreq

sensor.reset()                      # Reset and initialize the sensor.
sensor.set_pixformat(sensor.RGB565) # Set pixel format to RGB565 (or GRAYSCALE)
sensor.set_framesize(sensor.QQQVGA)   # 80x60
sensor.skip_frames(time = 2000)     # Wait for settings take effect.
clock = time.clock()                # Create a clock object to track the FPS.

bzrp = pyb.Pin("P7",pyb.Pin.OUT_PP)
bzrn = pyb.Pin("P8",pyb.Pin.OUT_PP)

image_width = sensor.width()
image_height = sensor.height()
working_size = int(image_width*image_height)

r_buf = list((range(working_size)))
g_buf = list((range(working_size)))
b_buf = list((range(working_size)))

for i in range(working_size):
    r_buf[i] = 0
    g_buf[i] = 0
    b_buf[i] = 0

def bzr_toggle():   #piezo buzzer between P7 and P8
    if bzrp.value() == 0:
        bzrp.high()
        bzrn.low()
    else:
        bzrp.low()
        bzrn.high()


freqs = cpufreq.get_current_frequencies()
print("CPU",freqs[0],"MHz",", working size",working_size,"/ allocated",sensor.get_framebuffers(),"buffers")

cnt = 0
fps_rst = 10

# ------------------------------------------------------------------------
while(True):
    bzr_toggle()    #to track FPS without IDE
    clock.tick()                    # Update the FPS clock.
    #img = sensor.snapshot()         # Take a picture and return the image.
    imgarr = sensor.snapshot().bytearray()  #this works faster than unpacking RGB tuple


    test = 0
    #test image processing, takes 91 ms (11 fps) in good lightning conditions
    for j in range(image_height):       #60
        for i in range(image_width):    #80
            test += i   #adds 6ms
            test += j*3 #adds 8ms
            pixel_index = j*image_width+i
            byte_index = pixel_index<<1
            bt0 = imgarr[byte_index]
            bt1 = imgarr[byte_index+1]
            bt2 = imgarr[byte_index+1] #adds 7 ms, for example
            r_buf[pixel_index] += bt1&0xF8
            g_buf[pixel_index] += ((bt1&7)<<5)|((bt0&0xE0)>>3)
            b_buf[pixel_index] += (bt0&0x1F)<<3


    #track FPS and frame processing time
    cnt += 1
    fps = clock.fps()
    tms = 1000/fps
    time_msg = "{:.2f} fps, {:.2f} ms"
    print(cnt,">",time_msg.format(fps,tms))
    fps_rst -= 1
    if fps_rst == 0:
        clock.reset()
        fps_rst = int(fps)
        if fps_rst < 5: fps_rst = 5

Processing a single 80x60 frame takes 91 ms. Maybe I doing some obvious mistake, because I’m new in python, but still, is this ok? Is there a means for improving performance for 10 times at least?

Hey,
In the part where you check for every frame with a nested for loop:

    for j in range(image_height):       #60
        for i in range(image_width):    #80
            test += i   #adds 6ms
            test += j*3 #adds 8ms
            ...

your time complexity increases from O(N) to O(N^3) (it is O(N) in the beginning because of the while loop above all).

You may think about decreasing the time complexity to have a way faster version of your code. Since you have no line of code between two loops, you can multiply width-height values before the loop and iterate until the multiplication, for instance. This will reduce your complexity to O(N^2), which would be faster than O(N^3).

Hi @sencery,
Your solution is great but not actually working. When you are reducing for loop count 2 to 1, you are increasing the loop count the single for statement which means that total loop count will not change and estimated execution time will stay same.

In the other hand @WhyNot, the code looks fine. This slowness caused by the openmv microcontroller. This device is way slower than a normal computer.

While we allow per pixel access in Python you ARE NOT supposed to do image processing in Python at a per pixel level.

Edit the C firmware if you want to do the above or use our image processing library.

It looks like you are summing color channels? There’s a method that computes the histogram in our library.

Use it.