Not sure if you guys seen this? GitHub - uTensor/uTensor: TinyML AI inference library
They have developed a way of turning a trained NN into a cpp file to run on an MCU. The example they gave was they compressed a digit hand writing recognition CNN into 32kb and ran it on a MCU. From a quick read it seems they Quantization the floats into 8bit ints this reduces the NN to 25% size of original size and speeds up the matrices multiplication several fold. see How to Quantize Neural Networks with TensorFlow « Pete Warden's blog