How does img.scale work?

I actually trained a model on images gathered with Arduino Nicla Vision.
After gathering of images, I trained the model with tensorflow python and converted the model to tflite to deploy it on Arduino Nicla Vision.
Though the model gives the output on one of the images from the training dataset, the output is not correct. I traced this issue back to the incorrect input given to the model.
Actually, the input to the model is a scaled-down image of the original grayscale image gathered with Arduino Nicla.
Therefore, if the image gathered with Nicla Vision is of size 240X160, then it is scaled to 45X30, which then acts as input to the model.

I verified that pixel intensities given in down-sized images by Python OpenCV are different from those given with img.scale function in OpenMV. This difference causes a different output from the trained model.
I tried using the same interpolation methods in the respective libraries, but the difference still exists.

I needed help determining how scaling down has been implemented inside OpenMV so that I could implement the same with normal Python and have the same inputs to the model on both platforms(Python and micropython inside OpenMV IDE).

I figured the scaling down process might be implemented inside draw.c under imlib_draw_image function. It would really help me if you could explain the process of resizing the image so that I could implement the same in normal python and get my model to the correct output

Hi,

Our scaling feature doesn’t use floating point numbers. It’s all fixed-point math with limited precision. So, it’s not going to produce 100% perfect results with desktop python. However, it is implemented mathematically correctly. So, you should only notice issues with differences in the LSBs of pixels. The MSBs should be the same though.

For our CNN code in py_tf we just use bilinear scaling to scale down the image to feed the network. Area scaling will produce the nicest looking image but it’s slower than bilinear. We avoid using nearest neighbors as the jaggies it produces cause issues.

Anyway, all the scaling methods are pretty text book and match what OpenCV does, except with different precision.

Does this help? Can you be more specific in your question?

Hey @kwagyeman,
yeah, i will explain the situation with an example.
Please refer to this excel sheet for reference.(openmv_scaling_down_compare - Google Sheets)
I have used the attached image for all the calculations.
Find the image here: inside this zip
eg.zip (30.4 KB)

The attached image was read in on both platforms. The first three columns in the excel sheet compare the pixel intensities read by python and micropython. We see that there is no difference in pixel intensities of the original image.
The other two sets of three columns compare the pixel intensities after scaling down the image. For now I have compared interpolation methods bilinear and area.

If you see data for these columns from rows 270 to 300, the difference in pixel intensities for bilinear and area interpolation is very different on both platforms.
The pixel difference of around 30 is huge, as even if we normalize the number with 255, the difference in pixels produces a difference of around 12 percent. Is it possible that a different precision can cause this magnitude of difference in intensities?
There are other parts in the columns where we see significant differences.

Therefore when an image with different pixel intensities on many pixels is given to the model, it doesn’t even give close to the correct answer. Therefore, I needed a way to minimize this error in the difference of pixel intensities. An error of up to 5 pixel intensities would be good to have. Therefore, I am looking for a way to actually minimize that image-based regression models work well on openmv devices.

Hi,

I see why this is happening. Both algorithms use different sample points on the downscaling. What we’re doing on the OpenMV Cam is correct but may not match up exactly with what you are doing on the desktop. Notice how the errors spike on cyclically. This is probably due to the image shifting in position.

Can you post the original image and the scaled down images? I’d like to inspect. Finally, you can just scale the image using area scaling before feeding it into the net if you’d like to bypass the bilinear that we do by default.

Please note that it’s not my intention at all to match desktop scaling code. So, do not have the expectation that I will change this code. It’s not our goal to match the same scaling. To get any speed at all on the MCU we had to switch to fixed point math along with using other tricks. Our goal was produce something that’s visually correct. So, if there’s obvious scaling errors that produce incorrect images I will fix those. But, your network should not have any problems with the difference in images assuming the image is visually correct.

By visually correct I mean that the scaled down image does not have any image artifacts, massive pixel shifts, etc. These issues are usually very apparent.

On my last debug of this code though I believe all the scalers do not produce image artifacts and work correctly.

You are right!
Visually there is no difference in the scaled-down images on both platform.
images.zip (35.3 KB)
Images.zip contain copies of scaled-down and original images.


Yeah, I completely understand. Therefore, in the first chat, I asked about if you could explain the process for the rescaling method so that I could implement it with normal Python for my use.
It would really help me if you could please shed more light on how you sample points for down-scaling and how use fixed math while down-scaling. I try to replicate the exact same process with python for my project.

Here’s the grayscale blinear code: https://github.com/openmv/openmv/blob/master/src/omv/imlib/draw.c#L4390

RGB565 is right above it. The code is super optimized though. So, it might take you a while to unpack what’s going on.

The key to understaning the code is to understand the __SMLAD instruction. Once you understand what that does then it all makes sense.

Other things… the code aggressively caches results so the inner loop can be as tight as possible.

Finally, scaling is done using fixed point math. The top 16-bits are the pixel position and then the bottom 16-bits are fractional position of the pixel which determines how blending is done.

Thank you so much! I will give it a try