Brains, not the Matrix — on-device machine learning from Google I/O

3 min readMay 16, 2019

What is the difference between the brain and the Matrix (refer to the movie The Matrix)?

The brain is owned and run by each individual (animal or human). Learning and inferencing are all done in one brain. Brains communicate with each other using languages, characters, and gestures.

The Matrix is one collective superpower. Everyone in the Matrix shares data and the Matrix can learn everyone’s behavior, generalize and infer the outcome.

Today, the majority of the machine learning applications are like the Matrix — raw data is collected from all over the internet, aggregated to train a model. Then the inference is made on centralized servers.

The Matrix model is suitable for web applications. For on-device applications, it has several drawbacks

Needs network connections
High latency
Privacy concerns
Extra data usage.

Looking at these drawbacks, all of them are from the fact that the learning and inferencing are done in the server.

Therefore, one thing I believe will make a difference is the progress in on-device machine learning and inferencing.

With on-device machine learning and inferencing, the model and data are all on the device.

Works offline for a lot of cases.
Much lower latency because there is no need to send the data to the server and wait for the inference result from the server
No raw data will be sent to the server. Thus, it mitigates privacy concern.
It minimized the amount of data that needs to send to the server.

On-device machine learning and inferencing

In recent Google IO, Google presented several advances in on-device machine learning and inferencing. Previously, due to the limitation of tools and computation capacity on devices, on-device machine learning and inferencing did not get good adoption.

TensorFlow Lite

TensorFlow Lite is capable of doing inference on-device. It can run on Google’s Edge TPU to provide fast and low power consumption on IoT device. (Watch this video Introducing Google Coral and AI for Mobile and IoT devices)

TensorFlow Lite can run quantized models that are optimized for devices.

At the moment, TensorFlow Lite is not able to do on-device learning on Edge TPU now. However, Google promised to make on-device available by the end of this year.

Reducing Model size

Mobile and IoT device has limited memory and storage. Also network bandwidth is lower and data cost is higher. Therefore, reducing model size is very important. Google claimed to have reduce the translation model to 250kB, speech detection model to 20kB and voice recognition model to 500MB.

Federated Learning

If you think of each device is a brain, Federated Learning is like people writing books about the same subject that will be read by a clever person to generalize their experience and make a textbook.

Federated Learning can aggregate data on-device to produce a device-specific update. The update is sent to the server to generate the final model. It removes the raw privacy-sensitive data from the data sent to the server and reduces the amount of data sent to the server.

Conclusion

We live in a world where most of the intelligence are from brains in each individual. This brings the diversity of the society. I believe the on-device learning model will play a much more important role to bring more diversity into machine learning and make machines more trustworthy.

Brains, not the Matrix — on-device machine learning from Google I/O

On-device machine learning and inferencing

TensorFlow Lite

Reducing Model size

Federated Learning

Conclusion

Links:

Written by William Liu

No responses yet