Google targets new chips to boost AI speed, taking on Nvidia
In just a few months, Google’s AI chips have emerged as one of the most sought-after commodities in the technology sector. Prominent artificial intelligence developers, along with several of the company’s major competitors, are accumulating them. Currently, the company owned by Alphabet Inc. seeks to capitalize on its momentum with the anticipated launch of new chips specifically designed for inference, which involves executing AI models post-training. With this push, Google is set to further challenge market leader Nvidia Corp. in a rapidly expanding category for semiconductors driven by the increasing adoption of AI software. With the increasing demand for rapid processing of AI queries, “it now becomes sensible to specialize chips more for training or more for inference workloads,” stated Google Chief Scientist Jeff Dean in an interview. “We are looking at a whole bunch of different things,” he added, including the speed of AI results it wants to enable. The company is set to unveil its latest generation of custom-designed chips, referred to as tensor processing units, or TPUs, during the Google Cloud Next conference taking place in Las Vegas this week. Amin Vahdat, responsible for Google’s AI infrastructure and chip development, refrained from discussing the plans for an inference chip designed to accelerate AI outputs. However, he indicated that further information will likely be revealed “in the relatively near future.”
Nvidia’s graphics processing units, or GPUs, continue to set the benchmark for AI, especially in the training of more sophisticated models. A rising cohort of challengers is positioning themselves to compete with the chipmaker for inference applications, notably by providing chips designed to reduce response times for chatbots and AI agents. Last month, Nvidia commenced the sale of a chip designed for accelerated inference, leveraging technology obtained from Groq in a reported $20 billion licensing agreement. Google possesses distinctive advantages in that competitive landscape, such as a decade of experience in chip design, extensive resources derived from its online search profits, and firsthand insights into AI models. Among the leading AI developers, only Google produces its own chips at a significant scale, enabling the sharing of crucial feedback between teams to enhance hardware customization. (OpenAI is just beginning to develop its own.) In a recent podcast interview, Nvidia’s Jensen Huang emphasized the benefits of his company’s chips, stating they can perform “a whole bunch of applications” that “you can’t do with TPUs.” Google utilizes a combination of TPUs and GPUs for its operations. “A lot of people would like to run on both,” stated Demis Hassabis in an interview. “Interest in TPUs is particularly high from leading AI labs,” he said.
Google has previously highlighted the inference capabilities of its chips. According to Partha Ranganathan, a vice president and engineering fellow at Google, it also considered releasing separate chips for training and inference early on, but so far it has resisted that approach. The landscape may soon shift as the surge in AI investment transitions from training to inference. “The battleground is shifting towards inference,” stated Chirag Dekate, Google’s Gemini model excels in responding swiftly to complex reasoning tasks. “In that battleground, Google has an infrastructure advantage.” According to Natalie Serrino, today’s TPUs are already a compelling option for processing results for the new generation of AI agents that handle increasingly complex tasks on behalf of users. “They are very good tools for the workload that is exploding,” she stated. In October, Google’s ongoing chip initiatives received renewed focus when Anthropic PBC — a prominent AI developer — announced an expanded agreement to access up to 1 million TPUs. In the following month, Google introduced the more sophisticated Gemini 3 model, which was trained and operated on TPUs, receiving enthusiastic acclaim. Since then, the demand for Google’s chips has continued to increase among major corporations.
Meta Platforms Inc. has entered into a multibillion-dollar agreement to utilize TPUs via Google Cloud over the coming years. “The company just received access to its first significant supply and is testing them out to see what tasks they’re best suited for,” said Santosh Janardhan, Meta’s head of infrastructure. “It does look like there might be inference advantages,” he said, while noting that “no new platform is without hurdles and a learning curve.” Anthropic has entered into an agreement with Broadcom Inc., the partner of Google for TPUs, to acquire chips that will provide access to approximately 3.5 gigawatts of computing power beginning in 2027. Citadel Securities intends to showcase at the Google conference the advantages of TPUs in enabling the company to train models more swiftly compared to earlier efforts utilizing GPUs. G42, the Abu Dhabi technology conglomerate, has engaged in “multiple discussions” with Google regarding the utilization of its TPUs, as stated by Talal Al Kaissi. “I’m very bullish,” Al Kaissi remarked regarding the discussions. Google is already implementing new measures to engage with customers in their current environments. A source familiar with the matter indicates that the company is exploring the possibility of allowing firms such as Anthropic to operate some of their TPUs in their own data centers instead of Google’s facilities. “It has also enabled TPU customers to use outside tools like PyTorch as well as other scheduling software rather than solely relying on Google’s products,” Vahdat said. These changes are aiding in altering the perception of chips that originated from Google’s computing bottlenecks and have long been considered mainly beneficial for the company to fulfill its own requirements.
After Dean, began developing an earlier AI software system to enable language translation and voice recognition services, he recognized that even Google could not afford to implement it with the existing chips and hardware. Concurrently, the central processing units that Google depended on for AI were advancing at a more gradual pace. The company resolved to construct an accelerator concentrating on a more limited range of tasks that could potentially incur the highest costs for AI. The central concept of the TPU is that it “solves a small number of problems but the amount of computation required for them was enormous,” stated Vahdat. “The prevailing belief during that period was that specialized hardware should not be developed.” Throughout the years, Google’s TPUs have progressed in tandem with its AI initiatives. A pivotal 2017 research paper from Google that led to the development of today’s large language models also directed the TPU team to concentrate on chips designed for training larger AI systems. Subsequently, Google DeepMind and the chips team observed that TPUs were frequently left idle when utilized for reinforcement learning, a widely recognized approach for enhancing AI systems in particular tasks. The TPU team modified their approach to networking different semiconductors, enhancing data flow and preventing chips from remaining idle.
The ongoing discussion at Google revolves around the optimal number of chips to connect within a single pod and the potential for hardware to operate with reduced precision to achieve cost savings. “A lot of those things are informed by the model experiments,” Hassabis stated. In the future, he hopes the TPU team will explore the possibility of developing an accelerator for edge-of-network scenarios, where the chip is situated nearer to users, as opposed to being accessed through the cloud, in order to minimize latency. Throughout its journey, Google has developed systems to swiftly identify manufacturing flaws that can significantly affect software. “When working with AI accelerator chips that manage massive amounts of math, even a subtle failure can metastasize and cause a model to ‘completely self-destruct,’” said Paul Barham, the Google distinguished scientist who co-leads the Gemini infrastructure team. An incident of that nature occurred at Google approximately two years ago, requiring weeks to determine the cause, he remarked, referring to these as “bugs from hell.” He stated “We now have to do that with hundreds of thousands of accelerator chips within 10 seconds.” Despite its prowess in AI development, Google encounters a challenge akin to that faced by other chipmakers: Chips typically require around three years for complete development, while AI models are advancing at a significantly quicker pace. This complicates the ability to forecast customer desires several years into the future. “If anybody claims they know what Gemini 10 is going to look like, I’m like, ‘Please give me whatever you’re smoking,’” Ranganathan stated.
Barham expresses concern that the close feedback loop between the creators of AI models and the designers of hardware may overlook innovative concepts. He stated, “there’s this cycle that traps you into what works well on the current software and hardware.” The TPU team often seeks a balanced approach, aiming for the chip to be sufficiently effective for a range of applications, even if it does not excel in every single one. According to Vahdat, the alternative is to create two distinct designs. While both may not ship, they have the potential to do so if the use case for each proves to be compelling enough. As Google’s chips gain popularity, the company faces potential supply constraints, similar to those experienced by Nvidia. One startup executive, speaking on condition of anonymity to address internal matters, stated that their company’s utilization of TPUs has been constrained by availability and expressed concerns that Google had essentially allocated all its chips to Anthropic. “Mostly we’re sort of favoring what supply we do have to the more elite teams who obviously are the ones that could maybe take the most advantage out of what the TPUs do best,” Hassabis said, referring to top AI firms. In the future, Google will have to determine how to distribute TPUs among its expanding range of competitive AI services and its increasing list of clients. “There are benefits to making TPUs only for Google, but there are substantial downsides,” Vahdat stated. “Eventually you find yourself on what we refer to as a tech island.” While the island may possess beauty, its population and diversity will inevitably be constrained. Ultimately, it is likely to be of lower quality.





