Google Edge TPU – What is it? And How Does It Work?

In this article:

  1. Google Edge TPU - What is it? And How Does It Work?

Google Edge TPU - What is it? And How Does It Work?

Assured Systems supply a variety of products featuring Google Edge TPU for a wide range of applications. Learn more about what Google Edge TPU is, how it works and how it could benefit you.

Discuss your project requirements

AI (Artificial Intelligence) is technology that enables a computer to think or act in a more ‘human’ way by taking in information from its surroundings and deciding its response based on what it learns or senses – and from consumer to enterprise applications, AI is pervasive today.

With the booming growth of connected devices, combined with a demand for privacy/confidentiality, low latency and bandwidth constraints, AI models trained in the cloud increasingly need to be run at the edge – simply meaning that the device can run AI application independently without having to send any data back to a server using an internet connection.

So what is Google Edge TPU?

Google Edge TPU is Googles purpose-built ASIC (Application Specific Integrated Circuit – optimized to perform a specific kind of application) designed to run AI at the edge. It delivers high performance in a small physical and power footprint, enabling the deployment of high accuracy AI at the edge.

Google Edge TPU complements Cloud TPU and Google Cloud services to provide an end-to-end, cloud-to-edge, hardware and software infrastructure for facilitating the deployment of customers AI based solutions.

But how?

Well, The Edge TPU allows you to deploy high-quality ML inferencing at the edge, using various prototyping and production products from Coral.

The Coral platform for ML at the edge augments Google’s Cloud TPU and Cloud IoT to provide an end-to-end (cloud-to-edge, hardware and software) infrastructure to facilitate the deployment of customers’ AI-based solutions. The Coral platform provides a complete developer toolkit so you can compile your own models or retrain several Google AI models for the Edge TPU, combining Googles expertise in both AI and hardware.

Google Edge TPU complements CPUs, GPUs, FPGAs and other ASIC solutions for running AI at the edge.

Cloud Vs The Edge

Running code in the cloud means that you use CPUs, GPUs and TPUs of a company that makes those available to you via your browser. The main advantage of running code in the cloud is that you can assign the necessary amount of computing power for that specific code (training large models can take a lot of computation)

The edge is opposite of the cloud. It means that you are running your code on premise (which basically means that you are able to physically touch the device the code is running on). The main advantage of running code on the edge is that there is no network latency. As IoT devices usually generate frequent data, running code on the edge is perfect for IoT based solutions

Whats the difference between TPU, CPU and GPU?

A TPU ( Tensor Processing Unit) is another kind of processing unit like a CPU or a GPU. There are, however, some big differences between those – the biggest difference being that TPU is an ASIC. And as you probably already know, CPUs and GPUs are not ASICs as they are not optimized to do one specific kind of application.

Since we are comparing CPUs, GPUs and TPUs lets quickly look at how they respectively perform multiply-add operations with their architecture:

  • A CPU performs the multiply-add operation by reading each input and weight from memory, multiplying them with its ALU (Arithmetic Logic Unit), writing them back to memory and finally adding up all the multiplies values. Modern CPUs are strengthened by massive cache, branch prediction and high clock rate on each of its cores. Which all contribute to a lower latency of the CPU.
  • A GPU does however not use the fancy features which lower the latency. It also needs to orchestrate its thousands of ALUs which further decreases the latency. In short, GPUs drastically increase its throughput by parallelising its computation in exchange for an increase in its latency.
  • A TPU on the other hand operates very differently. Its ALUs are directly connected to each other without using the memory. They can directly give pass information which will drastically decrease latency.

In conclusion, the Edge TPU performs inference faster than any other processing unit architecture. It is not only faster, its also more eco-friendly by using quantization and using less memory operations.