HNNewShowAskJobs
Built with Tanstack Start
TPUs vs. GPUs and why Google is positioned to win AI race in the long term(uncoveralpha.com)
165 points by vegasbrianc 8 hours ago | 166 comments
  • m4r1k6 hours ago

    Google's real moat isn't the TPU silicon itself—it's not about cooling, individual performance, or hyper-specialization—but rather the massive parallel scale enabled by their OCS interconnects.

    To quote The Next Platform: "An Ironwood cluster linked with Google’s absolutely unique optical circuit switch interconnect can bring to bear 9,216 Ironwood TPUs with a combined 1.77 PB of HBM memory... This makes a rackscale Nvidia system based on 144 “Blackwell” GPU chiplets with an aggregate of 20.7 TB of HBM memory look like a joke."

    Nvidia may have the superior architecture at the single-chip level, but for large-scale distributed training (and inference) they currently have nothing that rivals Google's optical switching scalability.

    • thelastgallon6 hours ago |parent

      Also, Google owns the entire vertical stack, which is what most people need. It can provide an entire spectrum of AI services far cheaper, at scale (and still profitable) via its cloud. Not every company needs to buy the hardware and build models, etc., etc.; what most companies need is an app store of AI offerings they can leverage. Google can offer this with a healthy profit margin, while others will eventually run out of money.

      • jauntywundrkind4 hours ago |parent

        Google's work on Jax, pytorch, tensorflow, and the more general XLA underneath are exactly the kind of anti-moat everyone has been clamoring for.

        • morkalork3 hours ago |parent

          Anti-moat like commoditizing the compliment?

          • sharpy2 hours ago |parent

            If they get things like PyTorch to work well without carinng what hardware it is running on, it erodes Nvidia's CUDA moat. Nvidia's chips are excellent, without doubt, but their real moat is the ecosystem around CUDA.

            • qeternity2 hours ago |parent

              PyTorch is only part of it. There is still a huge amount of CUDA that isn’t just wrapped by PyTorch and isn’t easily portable.

              • svaraan hour ago |parent

                ... but not in deep learning or am I missing something important here?

          • layer83 minutes ago |parent

            *complement

      • gigatexal3 hours ago |parent

        all this vertical integration no wonder Apple and Google have such a tight relationship.

    • mrbungie5 hours ago |parent

      It's fun when then you read last Nvidia tweet [1] suggesting that still their tech is better, based on pure vibes as anything in the (Gen)AI-era.

      [1] https://x.com/nvidianewsroom/status/1993364210948936055

      • qcnguy16 minutes ago |parent

        Not vibes. TPUs have fallen behind or had to be redesigned from scratch many times as neural architectures and workloads evolved, whereas the more general purpose GPUs kept on trucking and building on their prior investments. There's a good reason so much research is done on Nvidia clusters and not TPU clusters. TPU has often turned out to be over-specialized and Nvidia are pointing that out.

        • pests9 minutes ago |parent

          You say that like I d a bad thing. Nvidia architectures keep changing and getting more advanced as well, with specialized tensor operations, different accumulators and caches, etc. I see no issue with progress.

      • bigyabai42 minutes ago |parent

        > based on pure vibes

        The tweet gives their justification; CUDA isn't ASIC. Nvidia GPUs were popular for crypto mining, protein folding, and now AI inference too. TPUs are tensor ASICs.

        FWIW I'm inclined to agree with Nvidia here. Scaling up a systolic array is impressive but nothing new.

      • almostgotcaught3 hours ago |parent

        > NVIDIA is a generation ahead of the industry

        a generation is 6 months

        • wmf3 hours ago |parent

          For GPUs a generation is 1-2 years.

          • almostgotcaught2 hours ago |parent

            no https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_proces...

            • Arainach2 hours ago |parent

              What in that article makes you think a generation is shorter?

              * Turing: September 2018

              * Ampere: May 2020

              * Hopper: March 2022

              * Lovelace (designed to work with Hopper): October 2022

              * Blackwell: November 2024

              * Next: December 2025 or later

              With a single exception for Lovelace (arguably not a generation), there are multiple years between generations.

    • villgax6 hours ago |parent

      100 times more chips for equivalent memory, sure.

      • m4r1k5 hours ago |parent

        Check the specs again. Per chip, TPU 7x has 192GB of HBM3e, whereas the NVIDIA B200 has 186GB.

        While the B200 wins on raw FP8 throughput (~9000 vs 4614 TFLOPs), that makes sense given NVIDIA has optimized for the single-chip game for over 20 years. But the bottleneck here isn't the chip—it's the domain size.

        NVIDIA's top-tier NVL72 tops out at an NVLink domain of 72 Blackwell GPUs. Meanwhile, Google is connecting 9216 chips at 9.6Tbps to deliver nearly 43 ExaFlops. NVIDIA has the ecosystem (CUDA, community, etc.), but until they can match that interconnect scale, they simply don't compete in this weight class.

        • cwzwarich3 hours ago |parent

          Isn’t the 9000 TFLOP/s number Nvidia’s relatively useless sparse FLOP count that is 2x the actual dense FLOP count?

        • PunchyHamster2 hours ago |parent

          Yet everyone uses NVIDIA and Google is at catchup position.

          Ecosystem is MASSIVE factor and will be a massive factor for all but the biggest models

          • epolanskian hour ago |parent

            Catch-up in what exactly? Google isn't building hardware to sell, they aren't in the same market.

            Also I feel you completely misunderstand that the problem isn't how fast is ONE gpu vs ONE tpu, what matters is the costs for the same output. If I can fill a datacenter at half the cost for the same output, does it matters I've used twice the TPUs and that a single Nvidia Blackwell was faster? No...

            And hardware cost isn't even the biggest problem, operational costs, mostly power and cooling are another huge one.

            So if you design a solution that fits your stack (designed for it) and optimize for your operational costs you're light years ahead of your competition using the more powerful solution, that costs 5 times more in hardware and twice in operational costs.

            All I say is more or less true for inference economics, have no clue about training.

            • butvacuuman hour ago |parent

              Also, isn't memory a bit moot? At scale I thought that the ASICs frequently sat idle waiting for memory.

              • pests5 minutes ago |parent

                You're doing operations on the memory once it's been transferred to gpu memory. Either shuffling it around various caches or processors or feeding it into tensor cores or other matrix operations. You don't want to be sitting idle.

      • croon5 hours ago |parent

        Ironwood is 192GB, Blackwell is 96GB, right? Or am i missing something?

      • NaomiLehman6 hours ago |parent

        I think it's not about the cost but the limits of quickly accessible RAM

  • 1980phipsi6 hours ago

    > It is also important to note that, until recently, the GenAI industry’s focus has largely been on training workloads. In training workloads, CUDA is very important, but when it comes to inference, even reasoning inference, CUDA is not that important, so the chances of expanding the TPU footprint in inference are much higher than those in training (although TPUs do really well in training as well – Gemini 3 the prime example).

    Does anyone have a sense of why CUDA is more important for training than inference?

    • qcnguy12 minutes ago |parent

      CUDA is just a better dev experience. Lots of training is experiments where developer/researcher productivity matters. Googlers get to use what they're given, others get to choose.

      Once you settle on a design then doing ASICs to accelerate it might make sense. But I'm not sure the gap is so big, the article says some things that aren't really true of datacenter GPUs (Nvidia dc gpus haven't wasted hardware on graphics related stuff for years).

    • augment_mean hour ago |parent

      NVIDIA chips are more versatile. During training, you might need to schedule things to the SFU(Special Function unit that does sin, cos, 1/sqrt(x), etc), you might need to run epilogues, save intermediary computations, save gradients, etc. When you train, you might need to collect data from various GPUs, so you need to support interconnects, remote SMEM writing, etc.

      Once you have trained, you have frozen weights/feed-forward networks that consist out of frozen weights that you can just program in and run data over. These weights can be duplicated across any amount of devices and just sit there and run inference with new data.

      If this turns out to be the future use-case for NNs(it is today), then Google are better set.

      • grandmczeb40 minutes ago |parent

        All of those are things you can do with TPUs

    • rbanffy2 hours ago |parent

      This is a very important point - the market for training chips might be a bubble, but the market for inference is much, much larger. At some point we might have good enough models and the need for new frontier models will cool down. The big power-hungry datacenters we are seeing are mostly geared towards training, while inference-only systems are much simpler and power efficient.

      A real shame, BTW, all that silicon doesn't do FP32 (very well). After training ceases to be that needed, we could use all that number crunching for climate models and weather prediction.

    • Traster2 hours ago |parent

      Training is taking an enormous problem and trying to break it into lots of pieces and managing the data dependency between those pieces. It's solving 1 really hard problem. Inference is the opposite, it's lots of small independent problems. All of this "we have X many widgets connected to Y many high bandwidth optical telescopes" is all a training problem that they need to solve. Inference is "I have 20 tokens and I want to throw them at these 5,000,000 matrix multiplies, oh and I don't care about latency".

    • johnebgd6 hours ago |parent

      I think it’s the same reason windows is inportant to desktop computers. Software was written to depend on it. Same with most of the software out there today to train being built around CUDA. Even a version difference of CUDA can break things.

    • llm_nerd6 hours ago |parent

      It's just more common as a legacy artifact from when nvidia was basically the only option available. Many shops are designing models and functions, and then training and iterating on nvidia hardware, but once you have a trained model it's largely fungible. See how Anthropic moved their models from nvidia hardware to Inferentia to XLA on Google TPUs.

      Further it's worth noting that the Ironwood, Google's v7 TPU, supports only up to BF16 (a 16-bit floating point that has the range of FP32 minus the precision. Many training processes rely upon larger types, quantizing later, so this breaks a lot of assumptions. Yet Google surprised and actually training Gemini 3 with just that type, so I think a lot of people are reconsidering assumptions.

      • qeternityan hour ago |parent

        This is not the case for LLMs. FP16/BF16 training precision is standard, with FP8 inference very common. But labs are moving to FP8 training and even FP4.

    • baby_souffle6 hours ago |parent

      That quote left me with the same question. Something about decent amount of ram on one board perhaps? That’s advantageous for training but less so for inference?

    • imtringued5 hours ago |parent

      When training a neural network, you usually play around with the architecture and need as much flexibility as possible. You need to support a large set of operations.

      Another factor is that training is always done with batches. Inference batching depends on the number of concurrent users. This means training tends to be compute bound where supporting the latest data types is critical, whereas inference speeds are often bottlenecked by memory which does not lend itself to product differentiation. If you put the same memory into your chip as your competitor, the difference is going to be way smaller.

    • NaomiLehman6 hours ago |parent

      inference is often a static, bounded problem solvable by generic compilers. training requires the mature ecosystem and numerical stability of cuda to handle mixed-precision operations. unless you rewrite the software from the ground up like Google but for most companies it's cheaper and faster to buy NVIDIA hardware

      • never_inline5 hours ago |parent

        > static, bounded problem

        What does it even mean in neural net context?

        > numerical stability

        also nice to expand a bit.

  • sbarre6 hours ago

    A question I don't see addressed in all these articles: what prevents Nvidia from doing the same thing and iterating on their more general-purpose GPU towards a more focused TPU-like chip as well, if that turns out to be what the market really wants.

    • timmg6 hours ago |parent

      They will, I'm sure.

      The big difference is that Google is both the chip designer *and* the AI company. So they get both sets of profits.

      Both Google and Nvidia contract TSMC for chips. Then Nvidia sells them at a huge profit. Then OpenAI (for example) buys them at that inflated rate and them puts them into production.

      So while Nvidia is "selling shovels", Google is making their own shovels and has their own mines.

      • pzo2 hours ago |parent

        on top of that Google is also cloud infrastructure provider - contrary to OpenAI that need to have someone like Azure plug those GPUs and host servers.

      • 1980phipsi6 hours ago |parent

        Aka vertical integration.

      • m4rtink4 hours ago |parent

        So when the bubble pops the companies making the shovels (TSMC, NVIDIA) might still have the money they got for their products and some of the ex-AI companies might least be able to sell standard compliant GPUs on the wider market.

        And Google will end up with lots of useless super specialized custom hardware.

        • skybrian2 hours ago |parent

          It seems unlikely that large matrix multipliers will become useless. If nothing else, Google uses AI extensively internally. It already did in ways that weren’t user-visible long before the current AI boom. Also, they can still put AI overviews on search pages regardless of what the stock market does. They’re not as bad as they used to be, and I expect they’ll improve.

          Even if TPU’s weren’t all that useful, they still own the data centers and can upgrade equipment, or not. They paid for the hardware out of their large pile of cash, so it’s not debt overhang.

          Another issue is loss of revenue. Google cloud revenue is currently 15% of their total, so still not that much. The stock market is counting on it continuing to increase, though.

          If the stock market crashes, Google’s stock price will go down too, and that could be a very good time to buy, much like it was in 2008. There’s been a spectacular increase since then, the best investment I ever made. (Repeating that is unlikely, though.)

        • timmg3 hours ago |parent

          > And Google will end up with lots of useless super specialized custom hardware.

          If it gets to the point where this hardware is useless (I doubt it), yes Google will have it sitting there. But it will have cost Google less to build that hardware than any of the companies who built on Nvidia.

          • UncleOxidant2 hours ago |parent

            Right, and the inevitable bubble pop will just slow things down for a few years - it's not like those TPUs will suddenly be useless, Google will still have them deployed, it's just that instead of upgrading to a newer TPU they'll stay with the older ones longer. It seems like Google will experience much less repercussions when the bubble pops compared to Nvidia, OpenAI, Anthropic, Oracle etc. as they're largely staying out of the money circles between those companies.

          • immibis3 hours ago |parent

            aka Google will have less of a pile of money than Nvidia will

            • kolbe2 hours ago |parent

              Alphabet is the most profitable company in the world. For all the criticisms you can throw at Google, lacking a pile of money isn't one of them.

        • nutjob2an hour ago |parent

          How could Google's custom hardware become useless? They've used it for their business for years now and will do so for years into the future. It's not like their hardware is LLM specific. Google cannot lose with their vast infrastructure.

          Meanwhile OpenAI et al dumping GPUs while everyone else is doing the same will get pennies on the dollar. It's exactly the opposite to what you describe.

          I hope that comes to pass, because I'll be ready to scoop up cheap GPUs and servers.

          • qcnguy9 minutes ago |parent

            Same way cloud hardware always risks becoming useless. The newer hardware is so much better you can't afford to not upgrade, e.g. an algorithmic improvement that can be run on CUDA devices but not on existing TPUs, which changes the economics of AI.

        • acoustics4 hours ago |parent

          I think people are confusing the bubble popping with AI being over. When the dot-com bubble popped, it's not like internet infrastructure immediately became useless and worthless.

          • iamtheworstdev3 hours ago |parent

            that's actually not all that true... a lot of fiber that had been laid went dark, or was never lit, and was hoarded by telecoms in an intentional supply constrained market in order to drive up the usage cost of what was lit.

            • pksebbenan hour ago |parent

              If it was hoarded by anyone, then by definition not useless OR worthless. Also, you are currently on the internet if you're reading this, so the point kinda stands.

            • ithkuil2 hours ago |parent

              Are you saying that the internet business didn't grow a lot after the bubble popped?

            • bryanlarsenan hour ago |parent

              And then they sold it to Google who lit it up.

    • Workaccount26 hours ago |parent

      Deepmind gets to work directly with the TPU team to make custom modifications and designs specifically for deepmind projects. They get to make pickaxes that are made exactly for the mine they are working.

      Everyone using Nvidia hardware has a lot of overlap in requirements, but they also all have enough architectural differences that they won't be able to match Google.

      OpenAI announced they will be designing their own chips, exactly for this reason, but that also becomes another extremely capital intensive investment for them.

      This also doesn't get into that Google also already has S-tier dataceters and datacenter construction/management capabilities.

      • wood_spirit3 hours ago |parent

        Isn’t there a suspicion that OpenAI buying custom chips from another Sam Altman venture is just graft? Wasn’t that one of the things that came up when the board tried to out him?

    • HarHarVeryFunny6 hours ago |parent

      It's not that the TPU is better than an NVidia GPU, it's just that it's cheaper since it doesn't have a fat NVidia markup applied, and is also better vertically integrated since it was designed/specified by Google for Google.

      • UncleOxidant2 hours ago |parent

        TPUs are also cheaper because GPUs need to be more general purpose whereas TPUs are designed with a focus on LLM workloads meaning there's not wasted silicon. Nothing's there that doesn't need to be there. The potential downside would be if a significantly different architecture arises that would be difficult for TPUs to handle and easier for GPUs (given their more general purpose). But even then Google could probably pivot fairly quickly to a different TPU design.

    • fooker6 hours ago |parent

      That's exactly what Nvidia is doing with tensor cores.

      • bjourne6 hours ago |parent

        Except the native width of Tensor Cores are about 8-32 (depending on scalar type), whereas the width of TPUs is up to 256. The difference in scale is massive.

    • LogicFailsMe6 hours ago |parent

      That's pretty much what they've been doing incrementally with the data center line of GPUs versus GeForce since 2017. Currently, the data center GPUs now have up to 6 times the performance at matrix math of the GeForce chips and much more memory. Nvidia has managed to stay one tape out away from addressing any competitors so far.

      The real challenge is getting the TPU to do more general purpose computation. But that doesn't make for as good a story. And the point about Google arbitrarily raising the prices as soon as they think they have the upper hand is good old fashioned capitalism in action.

    • jauntywundrkind2 hours ago |parent

      Nvidia doesn't have the software stack to do a TPU.

      They could make a systolic array TPU and software, perhaps. But it would mean abandoning 18 years of CUDA.

      The top post right now is talking about TPU's colossal advantage in scaling & throughput. Ironwood is massively bigger & faster than what Nvidia is shooting for, already. And that's a huge advantage. But imo that is a replicateable win. Throw gobs more at networking and scaling and nvidia could do similar with their architecture.

      The architectural win of what TPU is more interesting. Google sort of has a working super powerful Connection Machine CM-1. The systolic array is a lot of (semi-)independent machines that communicate with nearby chips. There's incredible work going on to figure out how to map problems onto these arrays.

      Where-as on a GPU, main memory is used to transfer intermediary results. It doesn't really matter who picks up work, there's lots of worklets with equal access time to that bit of main memory. The actual situation is a little more nuanced (even in consumer gpu's there's really multiple different main memories, which creates some locality), but there's much less need for data locality in the GPU, and much much much much tighter needs, the whole premise of the TPU is to exploit data locality. Because sending data to a neighbor is cheap, sending storing and retrieving data from memory is slower and much more energy intense.

      CUDA takes advantage of, relies strongly on the GPU's reliance in main memory being (somewhat) globally accessible. There's plenty of workloads folks do in CUDA that would never work on TPU, on these much more specialized data-passing systolic arrays. That's why TPUs are so amazing, because they are much more constrained devices, that require so much more careful workload planning, to get the work to flow across the 2D array of the chip.

      Google's work on projects like XLA and IREE is a wonderful & glorious general pursuit of how to map these big crazy machine learning pipelines down onto specific hardware. Nvidia could make their own or join forces here. And perhaps they will. But the CUDA moat would have to be left behind.

    • blibble6 hours ago |parent

      the entire organisation has been built over the last 25 years to produce GPUs

      turning a giant lumbering ship around is not easy

      • sbarre6 hours ago |parent

        For sure, I did not mean to imply they could do it quickly or easily, but I have to assume that internally at Nvidia there's already work happening to figure out "can we make chips that are better for AI and cheaper/easier to make than GPUs?"

        • coredog642 hours ago |parent

          Isn't that a bit like Kodak knowing that digital cameras were a thing but not wanting to jeopardize their film business?

    • sojuz1516 hours ago |parent

      They lose the competitive advantage. They have nothing more to offer than what Google has in-house.

    • numbers_guy6 hours ago |parent

      Nothing in principle. But Huang probably doesn't believe in hyper specializing their chips at this stage because it's unlikely that the compute demands of 2035 are something we can predict today. For a counterpoint, Jim Keller took Tenstorrent in the opposite direction. Their chips are also very efficient, but even more general purpose than NVIDIA chips.

      • mindv0rtex3 hours ago |parent

        How is Tenstorrent h/w more general purpose than NVIDIA chips? TT hardware is only good for matmuls and some elementwise operations, and plain sucks for anything else. Their software is abysmal.

    • llm_nerd6 hours ago |parent

      For users buying H200s for AI workloads, the "ASIC" tensor cores deliver the overwhelming bulk of performance. So they already do this, and have been since Volta in 2017.

      To put it into perspective, the tensor cores deliver about 2,000 TFLOPs of FP8, and half that for FP16, and this is all tensor FMA/MAC (comprising the bulk of compute for AI workloads). The CUDA cores -- the rest of the GPU -- deliver more in the 70 TFLOP range.

      So if data centres are buying nvidia hardware for AI, they already are buying focused TPU chips that almost incidentally have some other hardware that can do some other stuff.

      I mean, GPUs still have a lot of non-tensor general uses in the sciences, finance, etc, and TPUs don't touch that, but yes a lot of nvidia GPUs are being sold as a focused TPU-like chip.

      • sorenjan6 hours ago |parent

        Is it the Cuda cores that run the vertex/fragment/etc shaders in normal GPUs? Where does the ray tracing units fit in? How much of a modern Nvidia GPU is general purpose vs specialized to graphics pipelines?

        • qcnguy6 minutes ago |parent

          A datacenter GPU has next to nothing left related to graphics. You can't use them to render graphics. It's a pure computational kernel machine.

    • sofixa6 hours ago |parent

      > what prevents Nvidia from doing the same thing and iterating on their more general-purpose GPU towards a more focused TPU-like chip as well, if that turns out to be what the market really wants.

      Nothing prevents them per se, but it would risk cannibalising their highly profitable (IIRC 50% margin) higher end cards.

  • Shorel10 minutes ago

    They can only privatize the AI race.

    If Google wins, we all lose.

  • zenoprax6 hours ago

    I have read in the past that ASICs for LLMs are not as simple a solution compared to cryptocurrency. In order to design and build the ASIC you need to commit to a specific architecture: a hashing algorithm for a cryptocurrency is fixed but the LLMs are always changing.

    Am I misunderstanding "TPU" in the context of the article?

    • HarHarVeryFunny6 hours ago |parent

      Regardless of architecture (which is anyways basically the same for all LLMs), the computational needs of modern neural networks are pretty generic, centered around things like matrix multiply, which is what the TPU provides. There is even TPU support for some operations built into PyTorch - it is not just a proprietary interface that Google use themselves.

    • kcb3 hours ago |parent

      LLMs require memory and interconnect bandwidth so needs a whole package that is capable of feeding data to the compute. Crypto is 100% compute bound. Crypto is a trivially parallelized application that runs the same calculation over N inputs.

    • olalonde4 hours ago |parent

      "Application-specific" doesn't necessarily mean unprogrammable. Bitcoin miners aren't programmable because they don't need to be. TPUs are ASICs for ML and need to be programmable so they can run different models. In theory, you could make an ASIC hardcoded for a specific model, but given how fast models evolve, it probably wouldn't make much economic sense.

    • immibis3 hours ago |parent

      Cryptocurrency architectures also change - Bitcoin is just about the lone holdout that never evolves. The hashing algorithm for Monero is designed so that a Monero hashing ASIC is literally just a CPU, and it doesn't even matter what the instruction set is.

    • p-e-w6 hours ago |parent

      It’s true that architectures change, but they are built from common components. The most important of those is matrix multiplication, using a relatively small set of floating point data types. A device that accelerates those operations is, effectively, an ASIC for LLMs.

      • bfrog6 hours ago |parent

        We used to call these things DSPs

        • tuhgdetzhh5 hours ago |parent

          What is the difference between a DSP and Asic? Is a GPU a DSP?

          • bfrog5 hours ago |parent

            DSP is simply a compute architecture that focuses on mutliply and accumulate operations on particular numerical formats, often either fixed point q15/q31 type values or floats f16/f32.

            The basic operation that a NN needs accelerating is... go figure multiply and accumulate with the added activation function.

            See for example how the Intel NPU is structured here: https://intel.github.io/intel-npu-acceleration-library/npu.h...

          • imtringued5 hours ago |parent

            A DSP contains analog to digital and digital to analog converters plus DMA for fast transfers to main memory and fixed function blocks for finite impulse response and infinite pulse response filters.

            The fact that they also support vector operations or matrix multiplication is kind of irrelevant and not a defining characteristic of DSPs. If you want to go that far, then everything is a DSP, because all signals are analog.

            • bfrog5 hours ago |parent

              See here https://intel.github.io/intel-npu-acceleration-library/npu.h...

              Maybe also note that Qualcomm has renamed their Hexagon DSP to Hexagon NN. Likely the change was adding activation functions but otherwise its a VLIW architecture with accelerated MAC operations, aka a DSP architecture.

            • bryanlarsen4 hours ago |parent

              I've worked on DSP's with none of those things. Well, they did have DMA.

          • duped5 hours ago |parent

            ASICs bake one algorithm into the chip. DSPs are programmable, like GPUs or CPUs. The thing that historically set them apart were MAC/FMA and zero overhead loops. Then there are all the nice to haves, like built in tables of FFT twiddle factors, helpers for 1D convolution, vector instructions, fixed point arithmetic, etc.

            What makes a DSP different from a GPU is the algorithms typically do not scale nicely to large matrices and vectors. For example, recursive filters. They are also usually much cheaper and lower power, and the reason they lost popularity was because Arm MCUs got good enough and economy of scale kicked in.

            I've written code for DSPs both in college and professionally. It's much like writing code for CPUs or MCUs (it's all C or C++ at the end of the day). But it's very different from writing compute shaders or designing an ASIC.

  • lukeschlather2 hours ago

    This feels a lot like the RISC/CISC debate. More academic than it seems. Nvidia is designing their GPUs primarily to do exactly the same tasks TPUs are doing right now. Even within Google it's probably hard to tell whether or not it matters on a 5-year timeframe. It certainly gives Google an edge on some things, but in the fullness of time "GPUs" like the H100 are primarily used for running tensor models and they're going to have hardware that is ruthlessly optimized for that purpose.

    And outside of Google this is a very academic debate. Any efficiency gains over GPUs will primarily turn into profit for Google rather than benefit for me as a developer or user of AI systems. Since Google doesn't sell TPUs, they are extremely well-positioned to ensure no one else can profit from any advantages created by TPUs.

    • turtletontine2 hours ago |parent

      > Since Google doesn't sell TPUs, they are extremely well-positioned to ensure no one else can profit from any advantages created by TPUs.

      First part is true at the moment, not sure the second follows. Microsoft is developing their own “Maia” chips for running AI on Azure with custom hardware, and everyone else is also getting in the game of hardware accelerators. Google is certainly ahead of the curve in making full-stack hardware that’s very very specialized for machine learning. But everyone else is moving in the same direction: lots of action is in buying up other companies that make interconnects and fancy networking equipment, and AMD/NVIDIA continue to hyper specialize their data center chips for neural networks.

      Google is in a great position, for sure. But I don’t see how they can stop other players from converging on similar solutions.

  • thesz6 hours ago

    5 days ago: https://news.ycombinator.com/item?id=45926371

    Sparse models have same quality of results but have less coefficients to process, in case described in the link above sixteen (16) times as less.

    This means that these models need 8 times less data to store, can be 16 and more times faster and use 16+ times less energy.

    TPUs are not all that good in the case of sparse matrices. They can be used to train dense versions, but inference efficiency with sparse matrices may be not all that great.

    • HarHarVeryFunny6 hours ago |parent

      TPUs do include dedicated hardware, SparseCores, for sparse operations.

      https://docs.cloud.google.com/tpu/docs/system-architecture-t...

      https://openxla.org/xla/sparsecore

  • loph5 hours ago

    This is highly relevant:

    "Meta in talks to spend billions on Google's chips, The Information reports"

    https://www.reuters.com/business/meta-talks-spend-billions-g...

  • hirako200038 minutes ago

    Then Groq should reign emperor?

  • thelastgallon6 hours ago

    With its AI offerings, can Google suck the oxygen out of AWS? AWS grew big because of compute. The AI spend will be far larger than compute. Can Google launch AI/Cloud offerings with free compute bundled? Use our AI, and we'll throw in compute for free.

  • ricardo816 hours ago

    It's a cool subject and article and things I only have a general understanding of (considering the place of posting).

    What I'm sure about is having a programming unit more purposed to a task is more optimal than a general programming unit designed to accommodate all programming tasks.

    More and more of the economics of programming boils down to energy usage and invariably towards physical rules, the efficiency of the process has the benefit of less energy consumed.

    As a Layman is makes general sense. Maybe a future where productivity is based closer on energy efficiency rather than monetary gain pushes the economy in better directions.

    Cryptocurrency and LLMs seem like they'll play out that story over the next 10 years.

  • jimbohn6 hours ago

    Given the importance of scale for this particular product, any company placing itself on "just" one layer of the whole story is at a heavy disadvantage, I guess. I'd rather have a winning google than openai or meta anyway.

    • subroutine6 hours ago |parent

      > I'd rather have a winning google than openai or meta anyway.

      Why? To me, it seems better for the market, if the best models and the best hardware were not controlled by the same company.

      • jimbohn6 hours ago |parent

        I agree, it would be the best of bad cases, in a sense. I have low trust in OpenAI due to its leadership, and in Meta, because, well, Meta has history, let's say.

  • jmward016 hours ago

    How much of current GPU and TPU design is based around attn's bandwith hungry design? The article makes it seem like TPUs aren't very flexible so big model architecture changes, like new architectures that don't use attn, may lead to useless chips. That being said, I think it is great that we have some major competing architectures out there. GPUs, TPUs and UMA CPUs are all attacking the ecosystem in different ways which is what we need right now. Diversity in all things is always the right answer.

  • paulmist6 hours ago

    > The GPUs were designed for graphics [...] However, because they are designed to handle everything from video game textures to scientific simulations, they carry “architectural baggage.” [...] A TPU, on the other hand, strips away all that baggage. It has no hardware for rasterization or texture mapping.

    With simulations becoming key to training models doesn't this seem like a huge problem for Google?

  • d--ban hour ago

    At this stage, it is somewhat clear that it doesn't really matter who's ahead in the race, cause everyone else is super close behind...

  • siliconc0w6 hours ago

    Google has always had great tech - their problem is the product or the perseverance, conviction, and taste needed to make things people want.

    • thomascgalvin6 hours ago |parent

      Their incentive structure doesn't lead to longevity. Nobody gets promoted for keeping a product alive, they get promoted for shipping something new. That's why we're on version 37 of whatever their chat client is called now.

      I think we can be reasonably sure that search, Gmail, and some flavor of AI will live on, but other than that, Google apps are basically end-of-life at launch.

      • nostrademons5 hours ago |parent

        It's telling that basically all of Google's successful projects were either acquisitions or were sponsored directly by the founders (or sometimes, were acquisitions that were directly sponsored by the founders). Those are the only situations where you are immune from the performance review & promotion process.

        • sidibe2 hours ago |parent

          They've actually had many very successful projects that make the few products and acquisitions you are thinking of work. It's true most of their end products don't work or get abandoned but it stretches their infrastructure in ways that works out well in the long run

      • siliconc0w5 hours ago |parent

        It's also paradoxically the talent in tech that isolates them. The internal tech stack is so incredibly specialized, most Google products have to either be built for internal users or external users.

        Agree there are lots of other contributing causes like culture, incentives, security, etc.

    • villgax6 hours ago |parent

      Fuschia or me?

  • kittikitti3 hours ago

    You can't really buy a TPU, you have to buy the entire data center that includes the TPU plus the services and support. In Google Colab, I often don't prefer the TPU either because the documentation for the AI isn't made for it. While this could all change in the long term, I also don't see these changes in Google's long term strategy. There's also the problem with Google's graveyard which isn't mentioned in the long term of the original article. Combined with these factors, I'm still skeptical about Google's lead on AI.

  • giardini4 hours ago

    All this assumes that LLMs are the sole mechanism for AI and will remain so forever: no novel architectures (neither hardware nor software), no progress in AI theory, nothing better than LLMs, simply brute force LLM computation ad infinitum.

    Perhaps the assumptions are true. The mere presence of LLMs seems to have lowered the IQ of the Internet drastically, sopping up financial investors and resources that might otherwise be put to better use.

    • olalonde4 hours ago |parent

      That's incorrect. TPUs can support many ML workloads, they're not exclusive to LLMs.

  • bhouston6 hours ago

    In my 20+ years of following NVIDIA, I have learned to never bet against them long-term. I actually do not know exactly why they continually win, but they do. The main issue they have a 3-4 year gap between wanting a new design pivot and realizing it (silicon has a long "pipeline"), it can seem that they may be missing a new trend or swerve in the demands of the market, it is often simply because there is this delay.

    • bryanlarsen6 hours ago |parent

      You could have said the same thing about Intel for ~50 years.

      • tim3332 hours ago |parent

        Depends on the top management though. I imagine Nvidia will keep doing well while Jensen Huang is running things.

    • newyankee6 hours ago |parent

      Fair, but the 75% margins can be reduced to 25% with healthy competition. The lack of competition in the frontier chips space was always the bottleneck to commoditization of computation, if such a thing is even possible

  • clickety_clack6 hours ago

    Any chance of a bit of support for jax-metal, or incorporating apple silicon support into Jax?

  • dana3216 hours ago

    That and the fact they can self-fund the whole AI venture and don't require outside investment.

    • jsheard6 hours ago |parent

      That and they were harvesting data way before it was cool, and now that it is cool, they're in a privileged position since almost no-one can afford to block GoogleBot.

      They do voluntarily offer a way to signal that the data GoogleBot sees is not to be used for training, for now, and assuming you take them at their word, but AFAIK there is no way to stop them doing RAG on your content without destroying your SEO in the process.

      • boredatoms5 hours ago |parent

        Do people still get organic search traffic from google?

      • lazyfanatic425 hours ago |parent

        Wow, they really got folks by the short hairs if that is true...

    • mrbungie6 hours ago |parent

      The most fun fact about all the developments post-ChatGPT is that people apparently forgot that Google was doing actual AI before AI meant (only) ML and GenAI/LLMs, and they were top players at it.

      Arguably main OpenAI raison d'être was to be a counterweight to that pre-2023 Google AI dominance. But I'd also argue that OpenAI lost its way.

      • lvl1556 hours ago |parent

        And they forgot to pay those people so most of them left.

        • OccamsMirror6 hours ago |parent

          To be fair, they weren't increasing Ads revenue.

          • lvl1556 hours ago |parent

            They literally gave away their secret sauce to OpenAI and pretended like it wasn’t a big opportunity.

            • mrbungie6 hours ago |parent

              Just as expected from a big firm with slower organizational speed. They can afford to make those mistakes.

  • DonHopkinsan hour ago

    Will Google sell TPUs that can be plugged into stock hardware, or custom hardware with lots of TPUs? Our customers want all their video processing to happen on site, and don't want their video or other data to touch the cloud, so they're not happy about renting cloud TPUs or GPUs. Also it would be nice to have smart cameras with built-in TPUs.

  • mosura6 hours ago

    This is the “Microsoft will dominate the Internet” stage.

    The truth is the LLM boom has opened the first major crack in Google as the front page of the web (the biggest since Facebook), in the same way the web in the long run made Windows so irrelevant Microsoft seemingly don’t care about it at all.

    • villgax6 hours ago |parent

      Exactly, ChatGPT pretty much ate away ad volume & retention if th already garbage search results weren't enough. Don't even get me started on Android & Android TV as an ecosystem.

      • IncreasePosts4 hours ago |parent

        That's not the story that GOOGs quarterly earning reports tell(ad revenue up 12% YoY)

        • pzo2 hours ago |parent

          most likely because they got more aggressive with campaign against adblock in chrome and more ads in youtube.

  • lvl1556 hours ago

    Right because people would love to get locked into another even more expensive platform.

    • svantana6 hours ago |parent

      That's mentioned in the article, but is the lock-in really that big? In some cases, it's as easy as changing the backend of your high-level ML library.

      • LogicFailsMe6 hours ago |parent

        That's what it is on paper. But in practice you trade one set of hardware idiosyncrasies for another and unless you have the right people to deal with that, it's a hassle.

        • lvl1556 hours ago |parent

          On top, when you get locked into Google Cloud, you’re effectively at the mercy of their engineers to optimize and troubleshoot. Do you think Google will help their potential competitors before they help themselves? Highly unlikely considering their actions in the past decade plus.

          • LogicFailsMe3 hours ago |parent

            Given my Fitbit's inability to play nice with my pixel phone, I have zero faith in Google engineers.

            What else would one expect when their core value is hiring generalists over specialists* and their lousy retention record?

            *Pay no attention to the specialists they acquihire and pay top dollar... And even they don't stick around.

      • tempest_5 hours ago |parent

        That is like how every ORM promises you can just swap out the storage layer.

        In practice it doesnt quite work out that way.

      • Irishsteve6 hours ago |parent

        I thin k you can only run on google cloud not aws bare metal azure etc

  • villgax6 hours ago

    https://killedbygoogle.com

    • mupuff12346 hours ago |parent

      That's actually one of the reasons why Google might win.

      Nvidia is tied down to support previous and existing customers while Google can still easily shift things around without needing to worry too much about external dependencies.

    • riku_iki6 hours ago |parent

      It's all small products which didn't receive traction.

      • davidmurdoch6 hours ago |parent

        It's not though. Chromecast, g suite legacy, podcast, music, url shortener,... These weren't small products.

        • IncreasePosts4 hours ago |parent

          Chromecast is "gone" because it bridged the gap of dumb tvs needing streaming capabilities. Now almost every tv sold has some kind of smart feature or can stream natively so Chromecast aren't needed.

        • riku_iki6 hours ago |parent

          chromecast is alive, podcast, music were migrated to youtube app, url shortener is not core business and just side hustle for google. Not familiar with g suite legacy.

      • bgwalter6 hours ago |parent

        Google Hangouts wasn't small. Google+ was big and supposedly "the future" and is the canonical example of a huge misallocation of resources.

        Google will have no problem discontinuing Google "AI" if they finally notice that people want a computer to shut up rather than talk at them.

        • riku_iki6 hours ago |parent

          > Google+ was big

          how you define big? My understanding they failed to compete with facebook, and decided to redirect resources somewhere else.

          • dekhn38 minutes ago |parent

            At the time Google+ was started and shortly after, leadership (larry page at that time) focused the attention of the company on it. There was a social bonus (that you'd get if you integrated your product), there were large changes to existing systems to support Google+, and the company made it quite clear it thought that social was the direction to go and that Google+ was going to be an enormous product.

            I and a lot of other googlers were really confused by all of this because at the time we were advocating that Google put more effort into its nascent cloud business (often to get the reply "but we already have appengine" or "cloud isn't as profitable as ads") and that social, while getting a lot of attention, wasn't really a good business for google to be in (with a few exceptions like Orkut and Youtube, Google's attempts at social have been pretty uninspired).

            There were even books written at the time that said Google looked lazy and slow and that Meta was going to eat their lunch. But shortly after Google+ tanked, Google really began to focus on Cloud (in a way that pissed off a lot of Googlers in the same way Google+ did- by taking resources and attention from other projects). Now, Meta looks like its going to have a challenging future while Google is on to achieving what Larry Page originally intended: a reliable revenue stream that is reinvested into development of true AI.

          • Workaccount26 hours ago |parent

            Google completely fumbled Google+ by doing a slow invite only launch.

            The hype when it was first coming to market was intense. But then nobody could get access because they heavily restricted sign ups.

            By the time it was in "open beta" (IIRC like 6-7 mos later), the hype had long died and nobody cared about it anymore.

            • kaz-inc5 hours ago |parent

              In my recollection, what killed g+ was forcing your YouTube account to become your g+ account, with your public name attached to the trashpit YouTube comments used to be. Everybody protested using g+, but the "Google account for everything" stuck around anyways.

          • lokar4 hours ago |parent

            They put a lot of effort into it, but it never had much usage.

      • villgax6 hours ago |parent

        Wait until Apple's ChromeBook competitor shows up to eat their lunch just like switching to another proprietary stack with no dev ecosystem will die out. Sure they'll go after big ticket accounts, also take a guess at what else gets sanctioned next.

        • IncreasePosts4 hours ago |parent

          Isn't an iPad with a keyboard or the air essentially a Chromebook competitor?

          The only lunch that will be eaten is Apple's own, since it would probably cannibalize their own sales of the MacBook air

  • qwertox6 hours ago

    How high are the chances that as soon as China produces their own competitive TPU/GPU, they'll invade Taiwan in order to starve the West in regards to processing power, while at the same time getting an exclusive grip on the Taiwanese Fabs?

    • CuriouslyC6 hours ago |parent

      The US would destroy TSMC before letting China have it. China also views military conquest of Taiwan as less than ideal for a number of reasons, so I think right now it's seen as a potential defensive move in the face of American aggression.

    • bryanlarsen5 hours ago |parent

      China will invade Taiwan when they start losing, not when they're increasingly winning.

      As long as "tomorrow" is a better day to invade Taiwan than today is, China will wait for tomorrow.

      • Xss35 hours ago |parent

        Their demographics beg to differ.

        • bryanlarsen4 hours ago |parent

          If demographics were a big deal, it'd be part of the same "better to invade today or tomorrow" calculation.

          Zeihan's predictions on China have been fabulously wrong for 20+ years now.

    • A4ET8a8uTh0_v26 hours ago |parent

      Seems low at the moment with the concept of G2 being floated as generic understanding of China's ascension to where Russia used to be effectively recreating bipolar semi cold war world order. Mind, I am not saying impossible, but there are reasons China would want to avoid this scenario ( probably one of the few things US would not tolerate and would likely retaliate ).

    • hjouneau6 hours ago |parent

      If they have the fabs but ASML doesn't send them their new machines, they will just end up in the same situation as now, just one generation later. If China wants to compete, they need to learn how to make the EUV light and mirrors.

    • Xss35 hours ago |parent

      The fabs would be destroyed in such a situation. The wesr would absolutely play that card in negotiations.

    • gostsamo6 hours ago |parent

      Not very. Those fabs are vulnerable things, shame if something happens to them. If China attacks, it would be for various other reasons and processors are only one of many considerations, no matter how improbable it might sound to an HN-er.

      • qwertox6 hours ago |parent

        What if China becomes self-sufficient enough to no longer rely on Taiwanese Fabs, and hence having no issues with those Fabs getting destroyed. That would put China as the leader once and for all.

        • gostsamo6 hours ago |parent

          First, the US has advanced fab capabilities and in case of a need can develop them further. On the other side, China will suffer a Russia style blockback while caught up in a nasty war with Taiwan.

          Totally possible, but the second order effects are much more complex than "leader once for all". The path for victory for China is not war despite the west, but a war when the west would not care.

          • the_af6 hours ago |parent

            The best path for victory for China is probably no war at all. War is wasteful and risky.

    • GordonS6 hours ago |parent

      Highly unlikely. Despite the rampant anti-Chinese FUD that's so prevalent in the media (and, sadly, here on HN), China isn't really in the habit of invading other lands.

      • CuriouslyC6 hours ago |parent

        The plot twist here is that China doesn't view Taiwan as foreign.

        • the_af5 hours ago |parent

          But China also doesn't see war as the best path forward in Taiwan (they want to return it to the mainland, not lay waste to it). The grandparent comment is unfairly downvoted in my opinion, the fact remains modern China is far less likely to be involved in military campaigns than, say, the US.