During last year's I/O developer conference, Google has revealed the company's own custom chips that will accelerate its machine learning algorithms. The name was revealed, Tensor Processing Units, however, Google didn't go far into explaining what exactly it can do. The only other detail that was revealed is that the TPUs were optimized for the company's own machine learning framework called TensorFlow.
Now, roughly a year after, Google is sharing more data and benchmarks about the said project. Chip designers can find all the specificities of the project in Google's paper. It's worth noting though, that the numbers that the company has declared are all based on their own benchmarking.
According to Google's results, their TPUs are around 15 to 30 times faster in executing their regular machine learning workloads compared to the standard GPU/CPU combo. Since power consumption also matters in a data center, the TPUs are also able to offer 30 to 80 times higher TeraOps/Watt.
Nevertheless, these numbers are all about using machine learning models in production and not about model creation, Tech Crunch reported. Google shares that it began to look into how they could use TPUs in their data center way back a decade ago. However, at the time, there weren't a lot of applications that can benefit from the special hardware yet. Apparently, most of the heavy loads back then were able to settle with using the excess hardware that was already available in the data center.
Google said that things only started changing in 2013 when the company projected that DNNs will become more popular and will ultimately double the demand on the data centers. This would be very expensive to fulfill using the traditional CPUs. This is why the team started the project to produce custom ASIC for inference. The team's goal is to improve cost-performance by up to ten times compared to GPUs.