Success stories batch ml predict?

Did anyone manage to train a tflite model that can make predictions on batches of images successfully on Openmv H7plus 4.7 release?Wanting to see if I can get a performance increase on a small model of mine by predicting in batches rather than in a serial manner.I am trying and failing miserably when loading the model created with tensorflow using the same parameters for quantization that I use for the normal serial model with a different representative_data set construct.

def representative_data_gen():
samples_per_class = rep_data_samples_per_class

class_counts = {j: 0 for j in range(NUM_CLASSES)}

batch_imgs = \[\]

for img, label in rep_data:

    true_class = np.argmax(label)

    if class_counts\[true_class\] >= samples_per_class:

        continue

    batch_imgs.append(img)

    class_counts\[true_class\] += 1

    if len(batch_imgs) == BATCH_SIZE:

        yield \[np.stack(batch_imgs, axis=0).astype(np.float32)\]

        batch_imgs = \[\]

\# Pad last incomplete batch with duplicates

if batch_imgs:

    while len(batch_imgs) < BATCH_SIZE:

        batch_imgs.append(batch_imgs\[-1\])  

    yield \[np.stack(batch_imgs, axis=0).astype(np.float32)\]

…    # Convert to quantized TFLite

converter = tf.lite.TFLiteConverter.from_keras_model(fixed_model)

converter.optimizations = \[tf.lite.Optimize.DEFAULT\]

converter.target_spec.supported_ops = \[tf.lite.OpsSet.TFLITE_BUILTINS_INT8\]

converter.inference_input_type = tf.int8

converter.inference_output_type = tf.int8

converter.representative_dataset = representative_data_gen

tflite_model = converter.convert()

Can it be that ml doesn’t support batch inference?
Tried both loading from ROM FS and Flash with load_to_fb True and False

tf_mod = ml.Model(F)

Model size: 39096 bytesAllocated RAM: 6016 Free RAM: 4331264

Load failed: Failed to allocate tensors

Allocated RAM: 4104384 Free RAM: 232896

note batch tflites generated from h5 files from which I generate also serial tflite models that run successfully so there are no operations in the model which the openmv cannot handle in the model.

Manage to load now after I was reminded on the bottom of the thread of an unsupported operation with the old tf library (old post of mine):
This was not required after tf to ml migration so I had removed it from my training procedure.
converter._experimental_disable_per_channel_quantization_for_dense_layers = True

I have added it to the tensorflow tflite conversion procedure and now the model loads.

After testing timing serial vs batch it is evident that on this platform there is no gain on batch vs serial performance wise.

Unless there are some batch specific optimization steps to be done in tensorflow.

If anyone has an application such as mine would be glad to exchange thoughts.

Yeah, there would be no performance gain as the batch operations are for when you have really wide SIMD paths, which the H7/RT1062 don’t. The N6 and AE3 have the NPU onboard which can be helpful for this.

Thanks will consider in a future upgrade.