Tensorflow Lite model output type error

Hi, I am using OpenMV firmware 4.5.9 and OpenMV IDE 4.2.0 to deploy a detection model. The quantized INT8 model has an output of type float32, which is different from the validation in Python scripts. I’ve checked my quant model detect_1000_int8.tflite, and the quantization seems successful.
For example, an identical image has different outputs in Python scripts validation and in OpenMV micropython scipts:

logits: array([-16.358, 15.7667], dtype=float32) # OpenMV
logits:  [[-67  57]] # Python

And I check the TFLite model tensor details, reassure that all operators are quantized to INT8 or INT16. I wonder why the output of quant model on OpenMV is float32, and why the results are so different from Tensorflow validation python scripts.

The followings are the scripts that I used:

# Validation on PC

interpreter = tf.lite.Interpreter(model_path='/home/user/detect_1000_hw100_int8.tflite')  
interpreter.allocate_tensors()  

input_details = interpreter.get_input_details()  
output_details = interpreter.get_output_details()  

print("Checking TFLite model tensor details...")
for detail in interpreter.get_tensor_details():
    print(f"Tensor Name: {detail['name']}, Type: {detail['dtype']}")

test_dataset = []
for i in range(num_samples):  
    img = torch.tensor(extracted_images[i], dtype=torch.float32)
    img = fivecrop_scale(img, crop_size=450)  # scale
    img_array = img.numpy()
    scale, zero_point = input_details[0]['quantization']
    img_int8 = np.round(img_array / scale + zero_point).astype(np.int8)  # quant input to int8
    img_int8 = np.expand_dims(img_int8, axis=0)  
    test_dataset.append(img_int8)

def run_inference_tflite(image_np):  
    interpreter.set_tensor(input_details[0]['index'], image_np)
    interpreter.invoke()  
    output_data = interpreter.get_tensor(output_details[0]['index'])  
    return output_data  

all_preds = []
i = 1
for img in test_dataset:  
    pred = run_inference_tflite(img)
    print('index',i, 'Pred', math.ceil(1 / (1 + np.exp(-pred[0][1]))))
    i = i + 1
    all_preds.append(pred)  



# Validation on OpenMV

   model = ml.Model("detect_1000_hw100_int8.tflite", load_to_fb=True)
    #print('model loaded')
    predicted_class = []
    num_iterations = 10

    for _ in range(num_iterations):
        if _ < 7:
            print('collecting ECGdata')
            samples = ECG_dataset.__getitem__()
            for sample in samples:
                input_list = []
                input_list.append(samples[0])
                logits = model.predict(input_list)
                predicted_class.append(probability(logits))

Here’s the quant model tensor details:

# Scripts:
print("Checking TFLite model tensor details...")
for detail in interpreter.get_tensor_details():
    print(f"Tensor Name: {detail['name']}, Type: {detail['dtype']}")

# OUTPUT:
Checking TFLite model tensor details...
Tensor Name: serving_default_input:0, Type: <class 'numpy.int8'>
Tensor Name: transpose_1/perm, Type: <class 'numpy.int32'>
Tensor Name: transpose_10/perm, Type: <class 'numpy.int32'>
Tensor Name: Const, Type: <class 'numpy.int32'>
Tensor Name: cond/transpose, Type: <class 'numpy.int32'>
Tensor Name: split_14, Type: <class 'numpy.int32'>
Tensor Name: flatten/Reshape/shape, Type: <class 'numpy.int32'>
Tensor Name: convolution, Type: <class 'numpy.int8'>
Tensor Name: Add;convolution;Const_1, Type: <class 'numpy.int32'>
Tensor Name: convolution_1, Type: <class 'numpy.int8'>
Tensor Name: Add_1;convolution;convolution_1;Const_3, Type: <class 'numpy.int32'>
Tensor Name: convolution_2, Type: <class 'numpy.int8'>
Tensor Name: Add_2;convolution_2;Const_5, Type: <class 'numpy.int32'>
Tensor Name: convolution_3, Type: <class 'numpy.int8'>
Tensor Name: Add_3;convolution;convolution_3;Const_7, Type: <class 'numpy.int32'>
Tensor Name: convolution_4, Type: <class 'numpy.int8'>
Tensor Name: Add_4;convolution;convolution_4;Const_9, Type: <class 'numpy.int32'>
Tensor Name: convolution_5, Type: <class 'numpy.int8'>
Tensor Name: Add_5;convolution_2;convolution_5;Const_11, Type: <class 'numpy.int32'>
Tensor Name: convolution_6, Type: <class 'numpy.int8'>
Tensor Name: Add_6;convolution;convolution_6;Const_13, Type: <class 'numpy.int32'>
Tensor Name: convolution_7, Type: <class 'numpy.int8'>
Tensor Name: Add_7;convolution;convolution_7;Const_15, Type: <class 'numpy.int32'>
Tensor Name: convolution_8, Type: <class 'numpy.int8'>
Tensor Name: Add_8;convolution_2;convolution_8;Const_17, Type: <class 'numpy.int32'>
Tensor Name: convolution_9, Type: <class 'numpy.int8'>
Tensor Name: Add_9;convolution;convolution_9;Const_19, Type: <class 'numpy.int32'>
Tensor Name: convolution_10, Type: <class 'numpy.int8'>
Tensor Name: Add_10;convolution_10;Const_21, Type: <class 'numpy.int32'>
Tensor Name: convolution_11, Type: <class 'numpy.int8'>
Tensor Name: Add_11;convolution_10;convolution_11;Const_23, Type: <class 'numpy.int32'>
Tensor Name: convolution_12, Type: <class 'numpy.int8'>
Tensor Name: Add_12;convolution_12;Const_25, Type: <class 'numpy.int32'>
Tensor Name: convolution_13, Type: <class 'numpy.int8'>
Tensor Name: Add_13;convolution_10;convolution_13;Const_27, Type: <class 'numpy.int32'>
Tensor Name: MatMul, Type: <class 'numpy.int8'>
Tensor Name: Const_30, Type: <class 'numpy.int32'>
Tensor Name: transpose_1, Type: <class 'numpy.int8'>
Tensor Name: Add;convolution;Const_11, Type: <class 'numpy.int8'>
Tensor Name: transpose_2, Type: <class 'numpy.int8'>
Tensor Name: Pad, Type: <class 'numpy.int8'>
Tensor Name: transpose_4, Type: <class 'numpy.int8'>
Tensor Name: onnx_tf_prefix_Relu_2;Add_1;convolution;convolution_1;Const_3, Type: <class 'numpy.int8'>
Tensor Name: transpose_5, Type: <class 'numpy.int8'>
Tensor Name: onnx_tf_prefix_Relu_6;Add_2;convolution_2;Const_5, Type: <class 'numpy.int8'>
Tensor Name: Add_3;convolution;convolution_3;Const_71, Type: <class 'numpy.int8'>
Tensor Name: cond/onnx_tf_prefix_Pad_3, Type: <class 'numpy.int8'>
Tensor Name: transpose_6, Type: <class 'numpy.int8'>
Tensor Name: avg_pool, Type: <class 'numpy.int8'>
Tensor Name: onnx_tf_prefix_Add_8, Type: <class 'numpy.int8'>
Tensor Name: transpose_13, Type: <class 'numpy.int8'>
Tensor Name: Pad_1, Type: <class 'numpy.int8'>
Tensor Name: transpose_15, Type: <class 'numpy.int8'>
Tensor Name: onnx_tf_prefix_Relu_10;Add_4;convolution;convolution_4;Const_9, Type: <class 'numpy.int8'>
Tensor Name: transpose_16, Type: <class 'numpy.int8'>
Tensor Name: onnx_tf_prefix_Relu_14;Add_5;convolution_2;convolution_5;Const_11, Type: <class 'numpy.int8'>
Tensor Name: Add_6;convolution;convolution_6;Const_131, Type: <class 'numpy.int8'>
Tensor Name: cond_1/onnx_tf_prefix_Pad_11, Type: <class 'numpy.int8'>
Tensor Name: transpose_17, Type: <class 'numpy.int8'>
Tensor Name: avg_pool_1, Type: <class 'numpy.int8'>
Tensor Name: onnx_tf_prefix_Add_16, Type: <class 'numpy.int8'>
Tensor Name: transpose_24, Type: <class 'numpy.int8'>
Tensor Name: Pad_2, Type: <class 'numpy.int8'>
Tensor Name: transpose_26, Type: <class 'numpy.int8'>
Tensor Name: onnx_tf_prefix_Relu_18;Add_7;convolution;convolution_7;Const_15, Type: <class 'numpy.int8'>
Tensor Name: transpose_27, Type: <class 'numpy.int8'>
Tensor Name: onnx_tf_prefix_Relu_22;Add_8;convolution_2;convolution_8;Const_17, Type: <class 'numpy.int8'>
Tensor Name: Add_9;convolution;convolution_9;Const_191, Type: <class 'numpy.int8'>
Tensor Name: cond_2/onnx_tf_prefix_Pad_19, Type: <class 'numpy.int8'>
Tensor Name: transpose_28, Type: <class 'numpy.int8'>
Tensor Name: avg_pool_2, Type: <class 'numpy.int8'>
Tensor Name: onnx_tf_prefix_Add_24, Type: <class 'numpy.int8'>
Tensor Name: Add_10;convolution_10;Const_211, Type: <class 'numpy.int8'>
Tensor Name: transpose_38, Type: <class 'numpy.int8'>
Tensor Name: Pad_3, Type: <class 'numpy.int8'>
Tensor Name: transpose_40, Type: <class 'numpy.int8'>
Tensor Name: onnx_tf_prefix_Relu_27;Add_11;convolution_10;convolution_11;Const_23, Type: <class 'numpy.int8'>
Tensor Name: transpose_41, Type: <class 'numpy.int8'>
Tensor Name: onnx_tf_prefix_Relu_31;Add_12;convolution_12;Const_25, Type: <class 'numpy.int8'>
Tensor Name: Add_13;convolution_10;convolution_13;Const_271, Type: <class 'numpy.int8'>
Tensor Name: cond_3/onnx_tf_prefix_Pad_28, Type: <class 'numpy.int8'>
Tensor Name: transpose_42, Type: <class 'numpy.int8'>
Tensor Name: avg_pool_3, Type: <class 'numpy.int8'>
Tensor Name: onnx_tf_prefix_Add_33, Type: <class 'numpy.int8'>
Tensor Name: transpose_49, Type: <class 'numpy.int8'>
Tensor Name: Mean, Type: <class 'numpy.int8'>
Tensor Name: flatten/Reshape;onnx_tf_prefix_Reshape_40, Type: <class 'numpy.int8'>
Tensor Name: PartitionedCall:0, Type: <class 'numpy.int8'>
Tensor Name: , Type: <class 'numpy.int32'>
Tensor Name: , Type: <class 'numpy.int32'>
Tensor Name: , Type: <class 'numpy.int32'>
Tensor Name: , Type: <class 'numpy.int8'>
Tensor Name: , Type: <class 'numpy.int8'>
Tensor Name: , Type: <class 'numpy.int8'>
Tensor Name: , Type: <class 'numpy.int8'>
Tensor Name: , Type: <class 'numpy.int8'>
Tensor Name: , Type: <class 'numpy.int8'>

Hi, our library is always going to output floating point values. It applies the scale/offset from the last layer of the model to the INT8 output to turn that into floating point.

What are the scale/offset values of the last layer of the model?

Hi! You mean the scale and zero_point? They’re:

scale = 0.19708408415317535 zero_point = -4

The code will apply this:

However, on checking your output it doesn’t match what’s expected. The same kind of process goes on for the input too:

What are you input and output expectations? You should be applying the scale/offset when inputting data and the same when outputting.

I’ve applied quantization for my input:

def quantize_pixel(pixel):
    """
        q_val = round(pixel / SCALE + ZERO_POINT)
        q_val = clamp(q_val, -128, 127)
    """
    #SCALE = 0.003921532537788153
    SCALE = 1
    ZERO_POINT = -128
    # (pixel / scale + zero_point)
    q_val = pixel * (1.0 / SCALE) + ZERO_POINT

    #  clamp to [-128, 127]
    if q_val > 127:
        q_val = 127
    elif q_val < -128:
        q_val = -128

    return int(q_val)

    img = image.Image(img_file, copy_to_fb=True)

    # Preprocess
    h = img.height()
    w = img.width()
    x_scale = self.crop_size / h
    y_scale = self.crop_size / w
    img.to_rgb565(x_scale = x_scale,
                  y_scale = y_scale,
                  rgb_channel = -1,
                  alpha = 255,
                  color_palette=None,
                  alpha_palette=None,
                  hint = 0,
                  copy = False,
                  copy_to_fb = False)
    buf = np.zeros((1, 3, self.crop_size, self.crop_size), dtype=np.int8)
    for y in range(img.height()):
        for x in range(img.width()):
            (r, g, b) = img.get_pixel(x, y, rgbtuple=True)
            q_r = quantize_pixel(r)
            q_g = quantize_pixel(g)
            q_b = quantize_pixel(b)
            buf[0, 0, x, y] = q_r
            buf[0, 1, x, y] = q_g
            buf[0, 2, x, y] = q_b

And in Python scripts, I do the scale/offset because the original input are normalized to [0,1]., However, in OpenMV scripts, the input are rgb565, so I do the scale/offset as:

q_val = pixel * (1.0 / 1) - 128

for every single pixel.
For one specific example, the output should be

INPUT : INT8 image
OUTPUT logits:  [[-67  57]]

But in OpenMV reasoning it becomes

INPUT : RGB565 image
OUTPUT logits: array([-14.7813, 13.993], dtype=float32)

So… you’re passing the ndarray directly to predict() right? If so, then it definitely only passes through this:

Where input scale and input offset come from the first layer of the model.

model_input_8[i] is then the first layer of the model in int8 form. Note that ndarray_get_float_index() returns the float representation of whatever the underlying type of the ndarray.

When the last layer executes it will come out here: openmv/src/omv/modules/py_ml.c at master · openmv/openmv · GitHub

And be turned into a floating point ndarray using the last layer zero_point and scale.

You quantizing the pixels in the ndarray doesn’t make sense… as it will happen again by the library.

Note that for images, the input method is bypassed and this code is run instead:

which runs

and here’s to_ndarray()

Can you post some sort of test case, model, input image, expected output? We’ve debugged this code with quite a few networks, though… so this is probably an expectation mismatch somewhere.

Thank you for your reply!

Here’s a test case.
Let’s start from a “sample.csv” input which is composed of RGB tuple of a 100*100 image.
Here’re my Python codes for Two-class classification:

def load_csv_image(path, height, width):
    data = []
    with open(path, 'r') as f:
        for line in f:
            r, g, b = [int(v) for v in line.strip().split(',')]
            data.append([r, g, b])
    # 3D array
    img_np_array = np.zeros((100,100,3), dtype=np.int8)
    index = 0
    for y in range(height):
        for x in range(width):
            img_np_array[y,x,0] = data[index][0]-128
            img_np_array[y,x,1] = data[index][1]-128
            img_np_array[y,x,2] = data[index][2]-128
            index += 1
    return img_np_array 

img_np = load_csv_image('sample.csv', 100, 100)
interpreter = tf.lite.Interpreter(model_path=“detect_h100_int8_NHWC.tflite”)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
img_int8 = np.expand_dims(img_np, axis=0)
interpreter.set_tensor(input_details[0]['index'], img_int8)
interpreter.invoke()
pred = interpreter.get_tensor(output_details[0]['index'])

To print the pred, it is

[[ 9.104117 -8.576343]]

The previous logit stands for the logit for the negative class, the latter for the positive class.
On OpenMV the codes:

def load_csv_image(img_file, h, w):
    # Preprocess
    buf = np.zeros((1, h, w, 3), dtype=np.int8)
    data = []
    with open(img_file, 'r') as f:
        for line in f:
            r, g, b = [int(v) for v in line.strip().split(',')]
            data.append([r, g, b])
    idx = 0
    for y in range(h):
        for x in range(w):
            r, g, b = data[idx]
            buf[0, y, x, 0] = r - 128
            buf[0, y, x, 1] = g - 128
            buf[0, y, x, 2] = b - 128
            idx += 1

    valid_samples = []
    valid_samples.append(buf)
    del buf, data
    return valid_samples

h,w = 100, 100
model = ml.Model("detect_h100_int8_NHWC.tflite", load_to_fb=False)
sample = load_csv_image('sample.csv', h, w)
pred = model.predict(sample)

To print the pred, it is

[array([[-16.361, 17.2846]], dtype=float32)]

which does not match the output of the prediction of [[ 9.104117 -8.576343]] on PC within identical input.
The sample.csv and .tflite model can be downloaded from the zip.
testcase.zip (87.3 KB)

btw, my PC virtual environment uses

tensorflow              2.5.0              
tensorflow-addons       0.21.0             
tensorflow-estimator    2.5.0              
termcolor               1.1.0              
tflite-runtime          2.7.0 

I’m unsure whether this discrepancy is due to the tflite converter versions…

Debugging now - will be a few days. Few things in the queue ahead of this.

1 Like

I haven’t forgotten you, releasing new firmware/IDE tomorrow and then will be able to take a look at this.

1 Like

Hi, sorry, for taking so long to debug this. Had to get the IDE release out.

Okay, so the issue is that your model specifies a scale and offset:

{ model_size: 72384, model_addr: 0x30006b80, ram_size: 70320, ram_addr: 0x30018650, input_shape: ((1, 100, 100, 3),), input_scale: (0.00392157,), input_zero_point: (-128,), input_dtype: ('b',), output_shape: ((1, 2),), output_scale: (1.0,), output_zero_point: (0,), output_dtype: ('f',) }

When you pass the int8 value to the model we will do this:

  uint8_t *model_input_8 = (uint8_t *) input_buffer;
              for (size_t i = 0; i < input_array->len; i++) {
                  float value = ndarray_get_float_index(input_array->array, input_array->dtype, i);
                  model_input_8[i] = (uint8_t) ((value * input_scale) + input_zero_point);
              }

If your TF code you are not bothering to handle the scale and offset. So, change your python code to:

import ml
from ulab import numpy as np

def load_csv_image(img_file, h, w):
    # Preprocess
    buf = np.zeros((1, h, w, 3))
    data = []
    with open(img_file, 'r') as f:
        for line in f:
            r, g, b = [int(v) for v in line.strip().split(',')]
            data.append([r, g, b])
    idx = 0
    for y in range(h):
        for x in range(w):
            r, g, b = data[idx]
            buf[0, y, x, 0] = r * 255
            buf[0, y, x, 1] = g * 255
            buf[0, y, x, 2] = b * 255
            idx += 1

    valid_samples = []
    valid_samples.append(buf)
    del buf, data
    return valid_samples

h,w = 100, 100
model = ml.Model("detect_h100_int8_NHWC.tflite", load_to_fb=False)
print(model)
sample = load_csv_image('sample.csv', h, w)
pred = model.predict(sample)
print(pred)
>>> { model_size: 72384, model_addr: 0x30006ac0, ram_size: 70320, ram_addr: 0x30018590, input_shape: ((1, 100, 100, 3),), input_scale: (0.00392157,), input_zero_point: (-128,), input_dtype: ('b',), output_shape: ((1, 2),), output_scale: (1.0,), output_zero_point: (0,), output_dtype: ('f',) }
[array([[9.368, -8.84023]], dtype=float32)]

By changing the input to floats and then pre-multiplying by 255 this undoes us dividing by 255 to apply your requested scale that the model wants and then we also subtract 128 for you as it’s already in the model spec.

Another alternative is to make the model not request scaling and offset changes to it’s input. Note that for floating point model input/output we do not apply a scale/offset.

Finally, note that we only do this scale/offset correction for ndarray inputs. If you pass an image object directly (RGB565 for example), we’d do the correct thing without you having to think at all about this.

E.g. models with int8 image inputs pretty much always want 0-255 RGB888 values minus -128. And if it’s a uint8 input then it’s just the 0-255 directly.

Of course… this is not always the case. After looking through quite a bit of models we’ve come to understand the scale/offset stuff it’s generally poorly implemented and exceptions all over the place to it.

Using the image interface:

import ml
from ulab import numpy as np
import image

def load_csv_image(img_file, h, w):
    # Preprocess
    buf = np.zeros((h, w, 3))
    data = []
    with open(img_file, 'r') as f:
        for line in f:
            r, g, b = [int(v) for v in line.strip().split(',')]
            data.append([r, g, b])
    idx = 0
    for y in range(h):
        for x in range(w):
            r, g, b = data[idx]
            buf[y, x, 0] = r
            buf[y, x, 1] = g
            buf[y, x, 2] = b
            idx += 1

    valid_samples = []
    valid_samples.append(buf)
    del buf, data
    return valid_samples

h,w = 100, 100
model = ml.Model("detect_h100_int8_NHWC.tflite", load_to_fb=False)
print(model)
sample = load_csv_image('sample.csv', h, w)
img = image.Image(sample[0])
pred = model.predict([img])
print(pred)
>>> { model_size: 72384, model_addr: 0x30006be0, ram_size: 70320, ram_addr: 0x300186c0, input_shape: ((1, 100, 100, 3),), input_scale: (0.00392157,), input_zero_point: (-128,), input_dtype: ('b',), output_shape: ((1, 2),), output_scale: (1.0,), output_zero_point: (0,), output_dtype: ('f',) }
[array([[8.57634, -8.18051]], dtype=float32)]

There’s some minor loss in accuracy because of the RGB888->RGB565->RGB888 conversion here.

During training, the model input’s dtype is float32, value ranging from 0 to 1 (I trained the model in pytorch), and I copy the .pth weights and bias to TF model and then quantize TF saved model to tflite model. When deploying the quantized tflite model on OpenMV, the classification results of the same input csv file on PC and on OpenMV are quite different. So I check the output of the first Conv layer, and found that the results between PC and OpenMV are different.

Please see my first post. I got the same results as you after applying scaling/offset fixes.

1 Like