jpeg compression algorithm

Hi guys
So I’m trying to implement the jpeg compression algorithm on the openMV board for my own personal project. Its mostly the same code that you have in your jpeg.c file. I ported most of it but I can’t seem to get a good conversion time like you guys have and I’m lost for ideas.
I have a 320x240 image and its taking me about 110ms to complete this conversion from RGB565 to jpeg.
Infact the jpeg_processDU function itself takes ~ 80us for one 8x8 macro block in my program. 80x40x30 ~ 95ms
My sysclk is running at 200MHz. As far as I can tell, you are not using any look up tables for the DCT conversion and there are a lot of floating point calculations.

Do you have any tips on how you guys are achieving 30fps? Are you guys using the Chrome ART accelerator for any image processing or anything else that is not obvious to me from just looking at the jpeg.c file?

Any help would be appreciated :slight_smile:

What quality level are you running at and did you keep the code buffers in the same places? The YUV LUT needs to be accessible pretty quickly along with the RAM buffers being cached.

It’s probably the subsampling.

Thanks for input. I will check these.
I noticed that I only enabled the Icache and not the Dcache in my program. So need to figure how to do that without causing any inconsistencies in the RAM.
I tried for quality level of 70% and 60%.
Also, the YUV LUT accessing is before the DCT conversion. I think what I am struggling with is just ignoring the conversion of RGB to YUV, it still takes ~80ms for just the DCT+zig-zagging+ huffman coding.

Im taking this one step at a time. if I can get just the DCT+zig-zagging+huffman coding fast, then I can target the RGB to YUV conversion.

Let me take a look at my subsampling and the cache-ing of buffer to see if that makes a difference.

Again, thank you for the suggestions. you guys are so much help…

Update: thanks to both your suggestions, I am now able to get the jpeg compression time for a 320x240 image to around 30ms.
I had to enable the DCache and also introduce 2x2 jpeg subsampling to get there. But the quality does degrade a bit from no subsampling (1x1) which I guess is the trade off.

So I assume that when I see the video in the openMV GUI, that is also from a 2x2 subsampling?

Another question I have is that I see that in your new module, you are using an H7. It has 1MB RAM but only 512KB is contiguous, so you would still not be able to store a RGB565 image in it for a DMA transfer. So, other than the 400MHz processor speed, is there any other big processing advantage for which you guys upgraded? Very curious.

Um, the extra RAM really helps. So, all of our extra buffers now are larger which allow us to store higher quality JPEG images. Additionally, the H7 does JPEG compression in hardware making it a lot faster at it than the M7. Anyway, the JPEG images look a lot better on the H7.

Generally the theme of the H7 is we finally have enough RAM where we basically hit the spec goals for he idea of the OpenMV Cam when we first started the whole project.

I remember the stm32f767 also having the jpeg encoder but the CPU had to feed it the 8x8 macro blocks. Did they fix that in the H7series? If so, that is cool. Performing it in the hardware would be quite fast. How long would it take to compress a 640x480 image in hw by the H7? I gotta read the docs :slight_smile:

Um, we can do about 20 FPS at 640x480. The JPEG hardware is not what is slow either. It actually takes only 1 ms for the HW to jpeg compress the image. however, it takes ~49 ms for the CPU to feed the hardware data. ST forgot to HW accelerate the jpeg encode path. They only really accelerated the decode path.

Interesting. I was just reading the app note AN4996 and I attached a snapshot of the encoding parameters to this post.

They specify 4ms for a 640x480 conversion with an H7 and F7. But mention that the RGB to YCbCr conversion would take 58ms. Now if you get the data in YcbCr format itself which the OV7725 could do, you could eliminate this 58ms time. Thats kinda what we did also to eliminate the RGB-YCbCr conversion time for getting 30ms. So didn’t use the YUV table.

But they do not mention how long it would take for the CPU to feed the jpeg HW encoder the 8x8 macro blocks.
So when you mean HW accelerate the encode path, do you mean providing a DMA controller to feed the data from memory to jpeg HW encoder? I ask because in RM0433, sec 32.3.2, it mentions that the jpeg internal signals can trigger the MDMA. I’ve never used MDMA but curious to whether MDMA can help in the feeding process or not? You guys must have already gone through this. So just curious…

The problem is that the camera doesn’t generate the data in 8x8 blocks. It generates scanlines. So you have to rearrange the data still using the CPU. ST made the hardware just for video playback so ta quite deficent at encoding.

yes. good point. the HW is optimized for video playback but not streaming video from camera.
So 20fps for 640x480 is not too shabby? The video should be smooth enough right?
Also why did you guys pick the H7 image sensor from On Semi over the older OV7725?

We support both the MT9V034 and the OV7725. The new system can actually support any sensor so we expect folks to mod the system heavily to add their own camera variants. Video recording wasn’t really our goal anyway. So, we’re not too concerned about it. Anyway, capturing is smooth as long as your streaming to the PC. We don’t have any buffer on the camera to deal with file system erase/write operations so you’ll get jitters when recording to the SD card since it likes to block data transfer while erasing causing dropped frames. So, just stream the video to an SBC or something. Snapshots work great though.

So only difference between an H7 and F7 when it comes to pushing the 8x8 macroblocks is the processor speed. 200MHz vs 400MHz.
Awesome! I’m definitely going to order myself an H7 board.

Thanks for all the answers. Appreciate it.

The H7 has a JPEG hardware compressor so it’s actually a lot faster.

Are the DCT coefficients from the H7’s hardware compression process exposed for use by the Python code? That would be useful…

Again plus one for DCT availability comment

I will add a method that gives you the histogram of the FFT magnitude of the image (and histogram of the phase). You’ll be able to select it for an ROI. We have to get the current release out the door for the H7 however so this has to wait a few weeks.