Tensorflow Lite model output type error

Hi, I am using OpenMV firmware 4.5.9 and OpenMV IDE 4.2.0 to deploy a detection model. The quantized INT8 model has an output of type float32, which is different from the validation in Python scripts. I’ve checked my quant model detect_1000_int8.tflite, and the quantization seems successful.
For example, an identical image has different outputs in Python scripts validation and in OpenMV micropython scipts:

logits: array([-16.358, 15.7667], dtype=float32) # OpenMV
logits:  [[-67  57]] # Python

And I check the TFLite model tensor details, reassure that all operators are quantized to INT8 or INT16. I wonder why the output of quant model on OpenMV is float32, and why the results are so different from Tensorflow validation python scripts.

The followings are the scripts that I used:

# Validation on PC

interpreter = tf.lite.Interpreter(model_path='/home/user/detect_1000_hw100_int8.tflite')  
interpreter.allocate_tensors()  

input_details = interpreter.get_input_details()  
output_details = interpreter.get_output_details()  

print("Checking TFLite model tensor details...")
for detail in interpreter.get_tensor_details():
    print(f"Tensor Name: {detail['name']}, Type: {detail['dtype']}")

test_dataset = []
for i in range(num_samples):  
    img = torch.tensor(extracted_images[i], dtype=torch.float32)
    img = fivecrop_scale(img, crop_size=450)  # scale
    img_array = img.numpy()
    scale, zero_point = input_details[0]['quantization']
    img_int8 = np.round(img_array / scale + zero_point).astype(np.int8)  # quant input to int8
    img_int8 = np.expand_dims(img_int8, axis=0)  
    test_dataset.append(img_int8)

def run_inference_tflite(image_np):  
    interpreter.set_tensor(input_details[0]['index'], image_np)
    interpreter.invoke()  
    output_data = interpreter.get_tensor(output_details[0]['index'])  
    return output_data  

all_preds = []
i = 1
for img in test_dataset:  
    pred = run_inference_tflite(img)
    print('index',i, 'Pred', math.ceil(1 / (1 + np.exp(-pred[0][1]))))
    i = i + 1
    all_preds.append(pred)  



# Validation on OpenMV

   model = ml.Model("detect_1000_hw100_int8.tflite", load_to_fb=True)
    #print('model loaded')
    predicted_class = []
    num_iterations = 10

    for _ in range(num_iterations):
        if _ < 7:
            print('collecting ECGdata')
            samples = ECG_dataset.__getitem__()
            for sample in samples:
                input_list = []
                input_list.append(samples[0])
                logits = model.predict(input_list)
                predicted_class.append(probability(logits))

Here’s the quant model tensor details:

# Scripts:
print("Checking TFLite model tensor details...")
for detail in interpreter.get_tensor_details():
    print(f"Tensor Name: {detail['name']}, Type: {detail['dtype']}")

# OUTPUT:
Checking TFLite model tensor details...
Tensor Name: serving_default_input:0, Type: <class 'numpy.int8'>
Tensor Name: transpose_1/perm, Type: <class 'numpy.int32'>
Tensor Name: transpose_10/perm, Type: <class 'numpy.int32'>
Tensor Name: Const, Type: <class 'numpy.int32'>
Tensor Name: cond/transpose, Type: <class 'numpy.int32'>
Tensor Name: split_14, Type: <class 'numpy.int32'>
Tensor Name: flatten/Reshape/shape, Type: <class 'numpy.int32'>
Tensor Name: convolution, Type: <class 'numpy.int8'>
Tensor Name: Add;convolution;Const_1, Type: <class 'numpy.int32'>
Tensor Name: convolution_1, Type: <class 'numpy.int8'>
Tensor Name: Add_1;convolution;convolution_1;Const_3, Type: <class 'numpy.int32'>
Tensor Name: convolution_2, Type: <class 'numpy.int8'>
Tensor Name: Add_2;convolution_2;Const_5, Type: <class 'numpy.int32'>
Tensor Name: convolution_3, Type: <class 'numpy.int8'>
Tensor Name: Add_3;convolution;convolution_3;Const_7, Type: <class 'numpy.int32'>
Tensor Name: convolution_4, Type: <class 'numpy.int8'>
Tensor Name: Add_4;convolution;convolution_4;Const_9, Type: <class 'numpy.int32'>
Tensor Name: convolution_5, Type: <class 'numpy.int8'>
Tensor Name: Add_5;convolution_2;convolution_5;Const_11, Type: <class 'numpy.int32'>
Tensor Name: convolution_6, Type: <class 'numpy.int8'>
Tensor Name: Add_6;convolution;convolution_6;Const_13, Type: <class 'numpy.int32'>
Tensor Name: convolution_7, Type: <class 'numpy.int8'>
Tensor Name: Add_7;convolution;convolution_7;Const_15, Type: <class 'numpy.int32'>
Tensor Name: convolution_8, Type: <class 'numpy.int8'>
Tensor Name: Add_8;convolution_2;convolution_8;Const_17, Type: <class 'numpy.int32'>
Tensor Name: convolution_9, Type: <class 'numpy.int8'>
Tensor Name: Add_9;convolution;convolution_9;Const_19, Type: <class 'numpy.int32'>
Tensor Name: convolution_10, Type: <class 'numpy.int8'>
Tensor Name: Add_10;convolution_10;Const_21, Type: <class 'numpy.int32'>
Tensor Name: convolution_11, Type: <class 'numpy.int8'>
Tensor Name: Add_11;convolution_10;convolution_11;Const_23, Type: <class 'numpy.int32'>
Tensor Name: convolution_12, Type: <class 'numpy.int8'>
Tensor Name: Add_12;convolution_12;Const_25, Type: <class 'numpy.int32'>
Tensor Name: convolution_13, Type: <class 'numpy.int8'>
Tensor Name: Add_13;convolution_10;convolution_13;Const_27, Type: <class 'numpy.int32'>
Tensor Name: MatMul, Type: <class 'numpy.int8'>
Tensor Name: Const_30, Type: <class 'numpy.int32'>
Tensor Name: transpose_1, Type: <class 'numpy.int8'>
Tensor Name: Add;convolution;Const_11, Type: <class 'numpy.int8'>
Tensor Name: transpose_2, Type: <class 'numpy.int8'>
Tensor Name: Pad, Type: <class 'numpy.int8'>
Tensor Name: transpose_4, Type: <class 'numpy.int8'>
Tensor Name: onnx_tf_prefix_Relu_2;Add_1;convolution;convolution_1;Const_3, Type: <class 'numpy.int8'>
Tensor Name: transpose_5, Type: <class 'numpy.int8'>
Tensor Name: onnx_tf_prefix_Relu_6;Add_2;convolution_2;Const_5, Type: <class 'numpy.int8'>
Tensor Name: Add_3;convolution;convolution_3;Const_71, Type: <class 'numpy.int8'>
Tensor Name: cond/onnx_tf_prefix_Pad_3, Type: <class 'numpy.int8'>
Tensor Name: transpose_6, Type: <class 'numpy.int8'>
Tensor Name: avg_pool, Type: <class 'numpy.int8'>
Tensor Name: onnx_tf_prefix_Add_8, Type: <class 'numpy.int8'>
Tensor Name: transpose_13, Type: <class 'numpy.int8'>
Tensor Name: Pad_1, Type: <class 'numpy.int8'>
Tensor Name: transpose_15, Type: <class 'numpy.int8'>
Tensor Name: onnx_tf_prefix_Relu_10;Add_4;convolution;convolution_4;Const_9, Type: <class 'numpy.int8'>
Tensor Name: transpose_16, Type: <class 'numpy.int8'>
Tensor Name: onnx_tf_prefix_Relu_14;Add_5;convolution_2;convolution_5;Const_11, Type: <class 'numpy.int8'>
Tensor Name: Add_6;convolution;convolution_6;Const_131, Type: <class 'numpy.int8'>
Tensor Name: cond_1/onnx_tf_prefix_Pad_11, Type: <class 'numpy.int8'>
Tensor Name: transpose_17, Type: <class 'numpy.int8'>
Tensor Name: avg_pool_1, Type: <class 'numpy.int8'>
Tensor Name: onnx_tf_prefix_Add_16, Type: <class 'numpy.int8'>
Tensor Name: transpose_24, Type: <class 'numpy.int8'>
Tensor Name: Pad_2, Type: <class 'numpy.int8'>
Tensor Name: transpose_26, Type: <class 'numpy.int8'>
Tensor Name: onnx_tf_prefix_Relu_18;Add_7;convolution;convolution_7;Const_15, Type: <class 'numpy.int8'>
Tensor Name: transpose_27, Type: <class 'numpy.int8'>
Tensor Name: onnx_tf_prefix_Relu_22;Add_8;convolution_2;convolution_8;Const_17, Type: <class 'numpy.int8'>
Tensor Name: Add_9;convolution;convolution_9;Const_191, Type: <class 'numpy.int8'>
Tensor Name: cond_2/onnx_tf_prefix_Pad_19, Type: <class 'numpy.int8'>
Tensor Name: transpose_28, Type: <class 'numpy.int8'>
Tensor Name: avg_pool_2, Type: <class 'numpy.int8'>
Tensor Name: onnx_tf_prefix_Add_24, Type: <class 'numpy.int8'>
Tensor Name: Add_10;convolution_10;Const_211, Type: <class 'numpy.int8'>
Tensor Name: transpose_38, Type: <class 'numpy.int8'>
Tensor Name: Pad_3, Type: <class 'numpy.int8'>
Tensor Name: transpose_40, Type: <class 'numpy.int8'>
Tensor Name: onnx_tf_prefix_Relu_27;Add_11;convolution_10;convolution_11;Const_23, Type: <class 'numpy.int8'>
Tensor Name: transpose_41, Type: <class 'numpy.int8'>
Tensor Name: onnx_tf_prefix_Relu_31;Add_12;convolution_12;Const_25, Type: <class 'numpy.int8'>
Tensor Name: Add_13;convolution_10;convolution_13;Const_271, Type: <class 'numpy.int8'>
Tensor Name: cond_3/onnx_tf_prefix_Pad_28, Type: <class 'numpy.int8'>
Tensor Name: transpose_42, Type: <class 'numpy.int8'>
Tensor Name: avg_pool_3, Type: <class 'numpy.int8'>
Tensor Name: onnx_tf_prefix_Add_33, Type: <class 'numpy.int8'>
Tensor Name: transpose_49, Type: <class 'numpy.int8'>
Tensor Name: Mean, Type: <class 'numpy.int8'>
Tensor Name: flatten/Reshape;onnx_tf_prefix_Reshape_40, Type: <class 'numpy.int8'>
Tensor Name: PartitionedCall:0, Type: <class 'numpy.int8'>
Tensor Name: , Type: <class 'numpy.int32'>
Tensor Name: , Type: <class 'numpy.int32'>
Tensor Name: , Type: <class 'numpy.int32'>
Tensor Name: , Type: <class 'numpy.int8'>
Tensor Name: , Type: <class 'numpy.int8'>
Tensor Name: , Type: <class 'numpy.int8'>
Tensor Name: , Type: <class 'numpy.int8'>
Tensor Name: , Type: <class 'numpy.int8'>
Tensor Name: , Type: <class 'numpy.int8'>

Hi, our library is always going to output floating point values. It applies the scale/offset from the last layer of the model to the INT8 output to turn that into floating point.

What are the scale/offset values of the last layer of the model?

Hi! You mean the scale and zero_point? They’re:

scale = 0.19708408415317535 zero_point = -4

The code will apply this:

However, on checking your output it doesn’t match what’s expected. The same kind of process goes on for the input too:

What are you input and output expectations? You should be applying the scale/offset when inputting data and the same when outputting.

I’ve applied quantization for my input:

def quantize_pixel(pixel):
    """
        q_val = round(pixel / SCALE + ZERO_POINT)
        q_val = clamp(q_val, -128, 127)
    """
    #SCALE = 0.003921532537788153
    SCALE = 1
    ZERO_POINT = -128
    # (pixel / scale + zero_point)
    q_val = pixel * (1.0 / SCALE) + ZERO_POINT

    #  clamp to [-128, 127]
    if q_val > 127:
        q_val = 127
    elif q_val < -128:
        q_val = -128

    return int(q_val)

    img = image.Image(img_file, copy_to_fb=True)

    # Preprocess
    h = img.height()
    w = img.width()
    x_scale = self.crop_size / h
    y_scale = self.crop_size / w
    img.to_rgb565(x_scale = x_scale,
                  y_scale = y_scale,
                  rgb_channel = -1,
                  alpha = 255,
                  color_palette=None,
                  alpha_palette=None,
                  hint = 0,
                  copy = False,
                  copy_to_fb = False)
    buf = np.zeros((1, 3, self.crop_size, self.crop_size), dtype=np.int8)
    for y in range(img.height()):
        for x in range(img.width()):
            (r, g, b) = img.get_pixel(x, y, rgbtuple=True)
            q_r = quantize_pixel(r)
            q_g = quantize_pixel(g)
            q_b = quantize_pixel(b)
            buf[0, 0, x, y] = q_r
            buf[0, 1, x, y] = q_g
            buf[0, 2, x, y] = q_b

And in Python scripts, I do the scale/offset because the original input are normalized to [0,1]., However, in OpenMV scripts, the input are rgb565, so I do the scale/offset as:

q_val = pixel * (1.0 / 1) - 128

for every single pixel.
For one specific example, the output should be

INPUT : INT8 image
OUTPUT logits:  [[-67  57]]

But in OpenMV reasoning it becomes

INPUT : RGB565 image
OUTPUT logits: array([-14.7813, 13.993], dtype=float32)

So… you’re passing the ndarray directly to predict() right? If so, then it definitely only passes through this:

Where input scale and input offset come from the first layer of the model.

model_input_8[i] is then the first layer of the model in int8 form. Note that ndarray_get_float_index() returns the float representation of whatever the underlying type of the ndarray.

When the last layer executes it will come out here: openmv/src/omv/modules/py_ml.c at master · openmv/openmv · GitHub

And be turned into a floating point ndarray using the last layer zero_point and scale.

You quantizing the pixels in the ndarray doesn’t make sense… as it will happen again by the library.

Note that for images, the input method is bypassed and this code is run instead:

which runs

and here’s to_ndarray()

Can you post some sort of test case, model, input image, expected output? We’ve debugged this code with quite a few networks, though… so this is probably an expectation mismatch somewhere.