GENX320 - streaming raw event data over USB

Hello, I’m running into the same issues as the ones above and I have a couple of questions.

First, even with the biases tuned to produce as little events as possible (within the prophesee limits), events are being dropped in raw event mode. I’m using a GenX320 + RT1062, with the latest development firmware (5.0.0), which includes the new raw event mode. This did help a lot, allowing for max ~3.2M events/s (with EVT_RES = 65536), but still under heavy camera motion it’ll produce large gaps in time with 0 events. I understand 3.2 million is already quite good (I really appreciate all of your improvements so far!), but I’m trying to use the camera in a very dynamic context and this limit is reached quite frequently. Is there any way to make this fail in a nicer way? For example, dropping some events is much better than producing nothing.

The real bottleneck seems to be the streaming rate. Both when trying to record raw events on an SD card and when streaming over USB, I get around 12.8 MB/s max (you mentioned you achieved about the same), even though theoretically both USB-C (60 MB/s) and micro SD (25 MB/s) should be capable of higher data speeds, so upgrading to the PoE shield doesn’t seem like it would solve the problem. I tried the dual threaded solution you proposed above with 1 thread receiving data and the other recording it, but the PC side RAM buffer never even started filling up while there were still blackouts, so I think the actual USB communication speed is the bottleneck.

I also tried streaming in histogram mode, but with USB-C I was only able to receive ~90 FPS (9.5 MB/s) and recording to micro SD I got ~66 FPS (6.8 MB/s) instead of the 300 FPS I set it to. Running something like the “genx320_histogram_grayscale_mode.py” example does show that 300 FPS is achievable on the board, but I can’t seem to actually get the data that fast on another device (over USB or SD). I haven’t seen any posts about this limitation (with histogram mode), so am I doing something wrong, or is it a known limitation?

Hmm,

So… putting the EVT_RES at 65536 and increasing the FIFO buffer limit are kinda the knobs that you have. Like, how deep is the fifo buffer count that you applied?

65536 events is about 256KB of data. Each buffer write would be that much; when sent to an SD card at that size, you should get nearly 20 MB/s write speed. However, you also need to make the fifo buffer deep enough to cover everything.

As for the 0 event stuff. The issue there is that the default on fifo buffer depth overflow is to flush all buffers. You need to create the csi object on the camera with “fflush=False”. csi — camera sensors - OpenMV MicroPython 1.28 documentation

Once you disable that, fifo buffers don’t flush anymore, but, now you’ll end up with stale events. However, nothing should drop. You should be able to set the number of buffers to like 64 without running out of RAM.

See this new guide I wrote on all of this: 7.15. Framebuffers - OpenMV MicroPython 1.28 documentation

However, while the image capture DMA pipeline will be good to RAM, yeah, you still have to get it off the system. Even though we are getting 12MB/s off the system, the camera input is faster, so, you overflow. Same for writing to an SD card at 20MB/s.

The camera input is 50MB/s. You need to reduce the camera’s data rate if you don’t want to drop data.

As for getting 60MB/s with USB. That’s theoretically possible; however, this would require directly targeting the USB controller DMA transactions on the data output of the camera DMA. You’re pretty much bypassing the USB stack at this point. TinyUSB currently copies data to send into an internal FIFO, which the USB controller services. This excess CPU involvement is what is hurting the max speed.

If one day TinyUSB supports a direct cut-through, then we’ll be able to get to these higher speed levels. But right now, this isn’t possible.

Note that we’ve already increased the fifo buffer size on the RT1062 to 4096 bytes and I’ve sniffed the USB packets to ensure that there are not gaps between 512 byte packets on the USB bus. The current system utilizes USB bandwidth efficiently during a transaction. But, in between them is the issue.

Thinking about this more… yeah, so, like the solution is to switch to the EVT3.0 protocol which vectorizes the events to reduce the datarate.

At the end of the day, the camera is faster at putting stuff in RAM than at getting stuff out of RAM.

Without reducing the data rate, the problem of getting off the system isn’t really solvable.

Do you have access to the Prophesee GenX320 datarate and user manual? They describe the more advanced formats there.

I’ve got access to both the datasheet and user manual, so I could take a crack at implementing the EVT 3.0 protocol. Just to confirm, the OpenMV GENX320 model is the MP (Mass-Produced) model, correct? In the Prophesee docs (https://docs.prophesee.ai/stable/data/encoding_formats/index.html), they state that only the MP model supports EVT 3.0, whereas the ES (Engineering Sample) model only supports EVT2.0 and EVT2.1.

Yeah, I’ve never seen the ES in the wild. We are using the MP.

For the raw format, since it’s not processed, you’d just need to make another ioctl that sets raw and the format to EV3.0 versus 2.0.

As for processing the data, you’ll have to handle that on the other side.

I had the FIFO buffer at 16, was able to increase it a lot. With fflush=False recording is much better, but for real-time use this doesn’t fully solve the issue of course. I don’t have access to the datasheet yet, I’m using a camera from my university so I’ll have to request access to it through them. @shulkinj if you’re happy to take a look at EVT 3.0 that would be great, if not I might be able to look at it as well if / when I have access to the datasheet.

By the way @shulkinj I saw you also implemented the STC mode, and saw that you said in the pull request that you were also working on the Event Rate Controller (ERC) mode. How far did you get with that, because from the documentation (Event Signal Processing — Metavision SDK Docs 5.3.1 documentation) it seems that would also solve the issue of exceeding data rate limits elegantly (hard cap on max transmitted events, gradual dropping of events to stay below maximum so buffer should never overflow)? If you don’t have time to finish it, I could maybe take a look at it as well.

Hi, okay, while I can’t post the datasheet. Here’s what Claude says on how to decode EVT2.1 and add it to the current viz tool:

Decoding the GenX320 raw stream when it’s in EVT2.1 mode

If you’re capturing with the genx320-event-streaming tool but you’ve switched the sensor to EVT2.1, the bytes on the wire are fine — you just have to decode them differently. Here’s the one gotcha and the decoder.

The 32-bit vs 64-bit gotcha

IOCTL_GENX320_READ_EVENTS_RAW packs the stream as 32-bit words, because it was written for EVT2.0 where one event = one 32-bit word. In EVT2.1 one event is a 64-bit word = two of those 32-bit slots. The cam still ships the same raw bytes, but:

  • The 32-bit-oriented count is now twice the real event count — i.e. your event count halves going to EVT2.1. Same bytes, half as many events.
  • Each EVT2.1 event lands as two consecutive little-endian 32-bit words: the low half (bits 31..0) first, then the high half (bits 63..32, which is where the type / timestamp / x / y actually live). So if you keep decoding the stream as EVT2.0 32-bit words, every other “event” is the all-zero low half and the rest is mis-aligned garbage.

The fix is a one-liner: take the exact same byte buffer the tool captures and read it as little-endian uint64 instead of uint32. Little-endian byte order pairs the two 32-bit halves back into the original 64-bit word for you.

# in the tool's PC-side decode, where it currently does:
#   words = np.frombuffer(data, dtype='<u4')   # EVT2.0
# do this instead:
words = np.frombuffer(data, dtype='<u8')        # EVT2.1, 64-bit events

Everything downstream then works on words with the EVT2.1 bit layout below.

The decoder

import numpy as np

EVT_NEG       = 0x0   # CD event, OFF (illumination decrease)
EVT_POS       = 0x1   # CD event, ON  (illumination increase)
EVT_TIME_HIGH = 0x8
EXT_TRIGGER   = 0xA

def _bits(x, hi, lo):                       # bits [hi:lo] out of a uint64 array
    return (x >> np.uint64(lo)) & np.uint64((1 << (hi - lo + 1)) - 1)

def decode_evt21(data):
    """data: raw little-endian byte stream from READ_EVENTS_RAW (EVT2.1).
    Returns (cd_events, triggers):
      cd_events: (M,4) -> [t_us, x, y, polarity]   polarity 1=ON, 0=OFF
      triggers:  (K,3) -> [t_us, channel_id, edge] edge 1=rising, 0=falling
    """
    w = np.frombuffer(data, dtype='<u8')    # 64-bit EVT2.1 words
    etype = _bits(w, 63, 60)

    # TIME_HIGH carries ts[33:6]; carry the latest one forward to every event.
    th     = etype == EVT_TIME_HIGH
    th_val = np.where(th, _bits(w, 59, 32), np.uint64(0))
    src    = np.maximum.accumulate(np.where(th, np.arange(w.size), 0))
    time_high = th_val[src]

    # --- CD events: each word is a vector of up to 32 pixels on one row ---
    cd    = np.flatnonzero((etype == EVT_NEG) | (etype == EVT_POS))
    t     = (time_high[cd] << np.uint64(6)) | _bits(w[cd], 59, 54)   # full us
    x_b   = _bits(w[cd], 53, 43).astype(np.uint32)                  # x, mult. of 32
    y     = _bits(w[cd], 42, 32).astype(np.uint32)
    vmask = _bits(w[cd], 31,  0).astype(np.uint32)                  # 32-bit valid
    pol   = (etype[cd] == EVT_POS).astype(np.uint8)

    # expand the valid mask: bit n set -> an event at (x_b + n, y)
    onbits = ((vmask[:, None] >> np.arange(32, dtype=np.uint32))
              & np.uint32(1)).astype(bool)
    r, n = np.nonzero(onbits)
    cd_events = np.column_stack([t[r], x_b[r] + n.astype(np.uint32),
                                 y[r], pol[r]])

    # --- external triggers (same timestamp reconstruction) ---
    tg  = np.flatnonzero(etype == EXT_TRIGGER)
    tgt = (time_high[tg] << np.uint64(6)) | _bits(w[tg], 59, 54)
    tid = _bits(w[tg], 44, 40).astype(np.uint8)   # 0=ext_trigger, 1=PXRSTN
    edg = _bits(w[tg], 32, 32).astype(np.uint8)   # 1=rising, 0=falling
    triggers = np.column_stack([tgt, tid, edg])

    return cd_events, triggers

Timestamps come out as the full 34-bit microsecond value: the 6 low bits ride on each event (ts[5:0]), the high 28 bits (ts[33:6]) come from the most recent TIME_HIGH, combined as (time_high << 6) | ts_low. Triggers use the exact same reconstruction, so a rising edge on the external trigger pin is a row in triggers with channel_id == 0 and edge == 1 at a real µs timestamp.

Optional: feed it back into the tool’s visualizer

The streaming tool’s events_to_image() expects the 6-column [type, sec, ms, us, x, y] layout. If you want the EVT2.1 events to flow straight into it, split the µs timestamp and map polarity to the ON/OFF type:

def to_six_col(cd_events):
    t, x, y, p = cd_events.T
    sec = (t // 1_000_000).astype(np.uint16)
    ms  = ((t // 1000) % 1000).astype(np.uint16)
    us  = (t % 1000).astype(np.uint16)
    typ = p.astype(np.uint16)                 # 1=ON, 0=OFF (matches PIX_ON/OFF)
    return np.column_stack([typ, sec, ms, us,
                            x.astype(np.uint16), y.astype(np.uint16)])

(sec as uint16 wraps at ~18 h, same as the tool’s native format.)

Watch out for

  • Drop events before the first TIME_HIGH. Until one arrives time_high is 0 and the timestamps are wrong — the forward-fill leaves them at the low 6 bits only, so filter them.
  • Ship whole 64-bit words. Make sure each captured chunk is a multiple of 8 bytes before frombuffer('<u8'), or stitch chunks together first; a packet boundary that splits a 64-bit word will misalign everything after it.
  • 34-bit µs counter wraps ~every 4.77 h — handle time_high going backwards if you record longer.

Bit layout is from the EVT2.1 spec; the only OpenMV-specific change vs the stock raw decode is reading the stream as <u8 instead of <u4 because the sensor is now emitting 64-bit events. Pin your firmware version when you post.

Mmm, actually, give me a bit, I’ll just update the gui to pull evt2.1 and evt3.0 and their aer format.

Okay, I updated it to support EVT2.1 and EVT3.0 along with the legacy AER format. Each one wins in different scene-dependent situations. openmv-projects/tools/genx320-event-streaming at master · openmv/openmv-projects · GitHub

That said, you may still want the huge event buffer sizes with plenty of FIFO depth.

Concerning the ERC, I haven’t touched it in a hot minute, mostly because Prophesee wasn’t working on implementing it (or the NFL filter) in the actual GENX320 driver code. As such, we’d have to alter Prophesee’s GENX320 driver code (psee_genx320.c) to support the ERC. I’d be willing to take another crack at it, though I’m not sure I’d be able to fully debug it, as I have some pressing deadlines coming up.

Unfortunately neither EVT2.1 or EVT3.0 seem to improve the events/s over EVT2.0, I think my usecase doesn’t have a high enough density to take advantage of the compression of these formats. EVT3.0 maybe slightly improves it, but not a lot. I really appreciate you adding this functionality though, I’m sure it could be very useful in other cases! For now I’ll just try to work with histograms instead and try to increase the throughput there.

w.r.t. the ERC I was mostly just curious if it was feasible but if it requires modifying Prophesee’s driver code that seems like a large amount of work / debugging. I might have a look at it later, but for now I’ll focus on the histogram mode. Thanks again for all the work and the fast replies!

Hi, try out the AER mode, interestingly enough, it has a higher event density ratio as it’s 24-bits per event versus 32-bits.

Also, change the biases to lower the event rates: GENX320 Event Camera - OpenMV MicroPython 1.28 documentation