> And there are reasons to even be really bullish about AI’s long-run profitability — most notably, the sheer scale of value that AI could create. Many higher-ups at AI companies expect AI systems to outcompete humans across virtually all economically valuable tasks. If you truly believe that in your heart of hearts, that means potentially capturing trillions of dollars from labor automation. The resulting revenue growth could dwarf development costs even with thin margins and short model lifespans.
We keep seeing estimates like this repeated by AI companies and such. There is something that really irks me with it though, which is that it assumes companies that are replacing labor with LLMs are willing to pay as much as (or at least a significant fraction of) the labor costs they are replacing.
In practice, I haven't seen that to be true anywhere. If Claude Code (for example) can replace 30% of a developers job, you would expect companies to be willing to pay tens of thousands of dollars per seat for it. Anecdotally at $WORK, we get nickel and dimed on dev tools (better for AI tools somewhat). I don't expect corporate to suddenly accept to pay Anthropic 50k$ per developer even if they can lay off 1/3 of us. Will anyone pay enough to realize the capture of "trillion dollar"?
Why do they have to charge employee tier prices? At the scale of job displacement they’re hoping for, a $2k/month per employee is better than even outsourcing overseas. And I think for a lot of jobs this is still profitable. Not every profession consumes the amount of tokens developers do hourly.
You’re right, and a big reason they won’t be able to capture the “full” value is because of competition, especially with open-source models. Sure, Claude will probably always be better… but better to the tune of $50k/seat?
LLMs are, ultimately, software. And we’ve had plenty of advances in software. None of them are priced at the market value of the labor they save. That’s just not how the economics work.
> it assumes companies that are replacing labor with LLMs are willing to pay as much as (or at least a significant fraction of) the labor costs they are replacing.
And it’s worth reiterating that most (all) of these LLM/AI providers are currently operating at significant losses. If they aim to become even modestly profitable, prices will have to increase substantially.
I think by the time people are willing to spend that kind of money we'd have to be in AGI territory and at that point any economic bets are off - investing in AI feels strange for that reason - I don't see a world where the slop becomes valuable enough to cover current investment levels. And I don't see a world in which investments matter if we get superhuman AGI.
- [deleted]
Framing GPT 5 as a loss because of its short run like that is a bit weird. They say "the R&D that went into GPT-5 likely informs future models like GPT-6". but that really understates what is happening here.
Barring solid evidence otherwise you would think that GPT 5.2 was built largely on GPT 5, enough that possibly the majority of the cost of 5.2 was in developing GPT 5.
It would be like if you shipped something v1.0 on day one and discovered a bug and shipped something v1.01 the next day. Then at the end of the year reported that v1.0 massively lost money but you wouldn't believe the profit we made on v1.01 it was the single largest return on a single day of development we've ever seen.
If this was the case, we would see R&D costs dropping for OpenAI. Not sure if that is the case.
They do still train models from scratch, and they are still making larger models.
You would expect that to use a lot more resources.
It seems to be that GPT 5.x are all likely to be extensions of the gpt5 base model with similar numbers of parameters.
The money spent on extending base models would be dwarfed by the scale increase of the next major number.
The Uber comparison is funny because they burned $32B over 14 years before profitability. OpenAI alone is burning something like $10B/year and growing, so the US AI labs are probably close to 10x Ubers burn rate.
Given the scale of AI lab burn rates being so high that the AI Capex shows up in nationwide economic stats, it clearly cannot burn for that long.
So what happens first - labs figure out how to get compute costs down by an order of magnitude, they add enough value to raise prices an order of magnitude (Uber), or some labs begin imploding?
Keep in mind another aspect of the comparison is that there wasn't an entire supply chain spend effect triggered by Uber. That is - you didn't have new car companies building new factories to produce 10x more cars, new roads being built, new gas stations being built, etc the way you have for AI.
It's like the entire economy has taken 1 giant correlated bet here.
Original is here (linked in the article):
https://epoch.ai/gradient-updates/can-ai-companies-become-pr...
Thanks, could the link for this post be replaced with the original?
What I dont understand is, why would a company pay $10,000s a month to anthropic for Claude in a situation where a Chinese LLM is 99% as good, is open weight and runs on US servers and is 5% the price?
By what metrics are they 99% as good? There are a lot of benchmarks out there. Please share them.
I think the answer lies in the "we actually care a lot about that 1% (which is actually a lot more than 1%)".
Open models have been about 6 to 9 months behind frontier models, and this has been the case since 2024. That is a very long time for this technology at it's current rate of development. If fast takeoff theory is right, this should widen (although with Kimi K2.5 it might have actually shortened).
If we consider what typically happens with other technologies, we would expect open models to match others on general intelligence benchmarks in time. Sort of like how every brand of battery-powered drill you find at the store is very similar, despite being head and shoulders better than the best drill from 25 years ago.
> That is a very long time for this technology at it's current rate of development.
Yes, as long as that gap stays consistent, there is no problem with building on ~9 months old tech from a business perspective. Heck, many companies are lagging behind tech advancements by decades and are doing fine.
> Sort of like how every brand of battery-powered drill you find at the store is very similar, despite being head and shoulders better than the best drill from 25 years ago.
They all get made in China, mostly all in the same facilities. Designs tend to converge under such conditions. Especially since design is not open loop - you talk to the supplier that will make your drill and the supplier might communicate how they already make drills for others.
I'm still testing myself and cannot make a confident statement yet, but Artifical Analysis is a solid and independent, though also to be fair somewhat imperfect source for a general overview: https://artificialanalysis.ai/
Purely according to Artificial Analysis, Kimi K2.5 is rather competitive in regard to pure output quality, agentic evals are also close to or beating US made frontier models and, lest we forget, the model is far more affordable than said competitors, to a point where it is frankly silly that we are actually comparing them.
For what it's worth, of the models I have been able to test as of yet, when focusing purely on raw performance (meaning solely task adherence, output quality and agentic capabilities; so discounting price, speed, hosting flexibility), I have personally found the prior Kimi K2 Thinking model to be overall more usable and reliable than Gemini 3 Pro and Flash. Purely on output quality in very specific coding tasks, Opus 4.5 was in my testing leaps and bounds superior of both the Gemini models and K2 Thinking however, though task adherence was surprisingly less reliable than Haiku 4.5 or K2 Thinking.
Being many times more expensive and in some cases less reliably adhering to tasks, I really cannot say that Opus 4.5 is superior or Kimi K2 Thinking is inferior here. The latter is certainly better in my specific usage than any Gemini model and again, I haven't yet gone through this with K2.5. I try not to just presume from the outset that K2.5 is better than K2 Thinking, though even if K2.5 remains at the same level of quality and reliability, just with multi modal input, that'd make the model very competitive.
As a heavy Claude Code user, I would like to have that option.
But if it's just 33% as good, I wouldn't bother.
Top LLMs have passed a usability threshold in the past few months. I haven't had the feeling the open models (from any country) have passed it as well.
When they do, we'll have a realistic option of using the best and the most expensive vs the good and cheap. That will be great.
Maybe in 2026.
Usain Bolt's top speed is about 44.72 km/h. My top speed sprinting is about 25 km/h. That's at least 50% as good. But I'd have a hard time getting paid even half as much as Mr Bolt.
OK but what if, staying in that analogy, you just have to wait 6-12 months to become as fast as Usain is right now?
In this analogy Usain Bolt will get twice as fast too, right? Using a solution that is always twice weaker than SotA, would put a company (or a national olympic team) at a significant disadvantage to competitors which do use the SotA.
Yeah, but you'd both be quite suitable to go walk to the grocery store.
Even someone with marginally less top speed isn’t getting paid half as much as Usain Bolt. Athletes aren’t getting paid per unit of output. This analogy is not analogising.
I don't think 99% of the best is a good metric.
It is highly dependent on what the best represents.
If you had a 100% chance of not breaking your arm on any given day, what kind of value would you place on it over a 99% chance on any given day. I would imagine it to be pretty high.
The top models are not perfect, so they don't really represent 100% of anything on any scale.
If the best you could do is have a 99% chance of not breaking your arm on any given day, then perhaps you might be more stoic about something that is 99% of 99% Which is close enough to 98% that you are 'only' going to double the number of broken arms you get in a year.
I suspect using AI in production will be a calculated more as likelihood of pain than increased widgets per hour. Recovery from disaster can eat any productivity gains easily.
I tried that but I'm back to paying OpenAI $200/month because the quality was significantly worse on my codebase.
How do they run on US servers? Self host? That’s not going to be cheap whilst the big AI players horde resources like memory.
There are many providers (Fireworks, Groq, Cerebras, Google Vertex), some using rather common hardware from Nvidia, etc., others their own solutions focused solely on high throughput inference. They often tend to be faster, cheaper and/or more reliable than what the lab that trained the model is charging [0], simply because there is some competition, unlike with US frontier models which at best can be hosted by Azure, AWS or GCloud at the same price as the first party.
And who funds those providers? They can clearly pull the plug any time. The so called many providers is the same group under the hood.
Nvidia was noted as having most of their revenue heavily tilted against only a few major customers for example.
You just click "buy" with another provider? They just host the model for money, nothing more.
To be clear: The models are open weights everyone can simply download, because many labs publish them as such. The providers in question are generic hosters. It's the same as if you would get some managed wordpress hosting somewhere.
> You just click "buy" with another provider? They just host the model for money, nothing more.
Start with where the GPU and rest of it comes from.
I don’t really understand your point. A company offering only inference will inherently always have lower costs and require less hardware than one that both provides inference and does training of new models.
The point is it doesn't work like that. Check out all the special purpose vehicles, loans and fund raises going around in that space. Deals claiming for $x billion in funds only got millions + GPU rentals in special deals. And the offering of those deals still hinges on the top 3 said customers buying to float things up.
i.e. whether you go for "only inference" or "both" it'll work out similar somewhere.
If you are referencing Nvidia, note that Vertex, Groq, Cerebras, etc. all do not rely on GPUs for much or all of their inference and are all the better for it.
Kind of why Nvidia see themselves forced to make such deals, to lock labs in before they loose another to objectively superior inference.
Staying purely with US labs, they just lost Meta and both Anthropic and Gemini models have been available on Vertex without relying on GPUs for a while. OpenAI equally is turning towards Cerebras so yeah, those deals will mean little for inference.
Those circular deals aren’t a thing for the inference providers I tend to go for and they don’t change the fact that using a full GPU purely for inference is like using a pickup truck for a track day. It can be done and the flexibility has advantages, but once you focus on one task, there are better tools for the job.
I'm starting to add inference providers to computeprices.com, but if you even just look at GPU/hr rentals, there are some reasonable options out there.
I personally have been enjoying shadeform to build the GPU setup I like.
Isn't there pretty good indications that the chinese llms have been trained on top of the expensive models?
Their cost is not real.
Plus you have things like MCP or agents that are mostly being spearheaded by companies like Anthropic. So if it is "the future" and you believe in it, then you should pay a premium to spearhead it.
You want to bet on the first Boeing not the cheapest copy of a Wright brother plane.
(Full disclosure, I dont think its the future and I think we are over leveraging on AI to a degree that is, no pun intended, misanthropic)
> Isn't there pretty good indications that the chinese llms have been trained on top of the expensive models?
So what ?
Well it raises an interesting conundrum. Suppose there's a microcontroller that's $5.00 and another that's $0.50. The latter is a clone of the former. Are you better off worrying only about your short term needs, or should you take the long view and direct your business towards the former despite it being more expensive?
Suppose both microcontrollers will be out of date in a week and replaced by far more capable microcontrollers.
The long view is to see the microcontroller as a commodity piece of hardware that is rapidly changing. Now is not the time to go all in on betamax and take 10 years leases on physical blockbuster stores when streaming is 2 weeks away.
Ai is possibly the most open technological advance I have experienced - there is no excuse, this time, for skilled operators to be stuck for decades with AWS or some other propriety blend of vendor lock-in.
This isn't betamax vs VHS. It's VHS vs a clone of VHS. The long view necessarily has to account for R&D, long term business partners, and lots of other externalities. The fact that both manufacturers will have a new model of VCR out next month, and yet another the month after that, really has nothing to do with the conundrum I gave voice to.
I'll also note that there's zero vendor lock-in in either scenario. It's a simple question about the tradeoffs of indirect parasitism within the market. I'm not even taking a side on it. I don't even know for certain that any given open weights Chinese model was trained against US frontier models. Some people on HN have made accusations but I haven't seen anything particularly credible to back it up.
If the clone is 1/10th of the price, and of equivalent quality, why would I use the original ? I would be undercut by my concurrents if i did that, it would be a very bad business decision.
Well the company of the former microcontroller has gone out of their way to make getting and developing on actual hardware as difficult and expensive as possible as possible, and could reasonably accused of doing “suspect financial shenanigans”, and the other company will happily sell me the microcontroller for a reasonable price. And sure, thy started off cloning the former, but their own stuff is getting really quite good these days.
So really, the argument pretty well makes itself in favour of the $0.5 micro controller.
That's a very tenuous analogy. Microcontrollers are circuits that are designed. LLMs are circuits that learned using vast amounts of data scraped from the internet, and pirated e-books[1][2][3].
[1]: https://finance.yahoo.com/news/nvidia-accused-trying-cut-dea...
[2]: https://arstechnica.com/tech-policy/2025/12/openai-desperate...
[3]: https://www.businessinsider.com/anthropic-cut-pirated-millio...
> Microcontrollers are circuits that are designed. LLMs are circuits that learned using vast amounts of data
So I suppose the AI companies employ all those data scientists and low level performance engineers to what, manage their website perhaps?
It's poor form to go around inserting your pet issue where it isn't relevant.
You're asking whether businesses will choose to pay a 1000% markup on commodities?
> Isn't there pretty good indications that the chinese llms have been trained on top of the expensive models?
there are pretty good indications that the american llms have been trained on top of stolen data
This is proven. You can prove it yourself easily. Take a novel from your bookshelf, type in any sentence from the novel and ask it what book it's from. Ask it for the next sentence.
This works with every novel I've tried so far in Gemini 3.
My actual prompt was a bit more convoluted than this (involving translation) so you may need to experiment a bit.
> Isn't there pretty good indications that the chinese llms have been trained on top of the expensive models?
How do you even do that? You can train on glorified chat logs from an expensive model, but that's hardly the same thing. "Model extraction" is ludicrously inefficient.
> How do you even do that?
I am not going to comment on how they did it. But they were openly accused by OpenAI of it. I believe the discussion is over destillation vs foundational models.
https://www.jdsupra.com/legalnews/openai-accuses-deepseek-of...
There are other theories like OpenAI inflated their training costs to seek further investment in later growth quarters. Meanwhile Deepseek under reported their cost to portray China as more cost efficient investment. If that was the case then their performance is similar, with similar training costs but one side reported even the coffee from the coffee machine in the office in the total while the other only counted the minimal CPU cycle cost and not the GPU, energy, engineering etc. Which is plausible too.
I have no dog in the fight but the first accusation seemed quite serious, hence why I asked
This so-called "PC compatible" seems like a cheap copy, give me a real IBM every time.
> Their cost is not real.
They can’t even officially account for any nvidia gpus they managed to buy outside the official channels.
> But we can still do an illustrative calculation: let’s conservatively assume that OpenAI started R&D on GPT-5 after o3’s release last April. Then there’d still be four months between then and GPT-5’s release in August,20 during which OpenAI spent around $5 billion on R&D. But that’s still higher than the $3 billion of gross profits. In other words, OpenAI spent more on R&D in the four months preceding GPT-5, than it made in gross profits during GPT-5’s four-month tenure.
These numbers actually look really good. OpenAI's revenue has been increasing 10x year-on-year since 2023, which means spending even 3x the amount of resources to produce another model next year would likely generate a healthy profit. The newer models are more efficient so inference costs tend to decrease as well.
As long as models can keep improving, and crucially if that depends on the amount of compute you put in them, OpenAI and other closed-source AI companies will succeed. If those two key assumptions stop being true, I can definitely see the whole castle of cards crumbling as open source competition will eat their lunch.
But on the other hand every single model is competing in the sense that there's only one world to fill up with slop.
Maybe in a few years LinkedIn will have 10 times more AI slop, but that won't make either LinkedIn or the AI companies 10x more valuable.
I guess my point is that the models, IRL, salt the earth. Since model use became common people have started using them to cheat on their work as students, cheat on job interviews, cheat their duties to scientific integrity, and generally stop building and sharing communal knowledge. Why wouldn't you cheat and steal instead of improving yourself when you're told that in the future your skill and intelligence will be utterly and completely negligible compared to that of The Machine.
This is already costing society trillions of dollars in practical damages. OpenAI is one of the world's biggest polluters, but in the past there has never been a need for legal protections against indsutrial-scale polluting of the well of human knowledge so we're still catching up. Nevertheless the fact that the unit economics look even this good is because they're dumping the real costs onto society itself.
If inference is A) expensive and B) has enough predicted demand to build a ton of datacenters, why is AI stuff being shoved everywhere, without an obvious profit motive, in a way that people hate? AI results in search boxes, apple mail summaries, that sora video app, AI image gen that very few people would use if it wasnt free.
I think the question is as it matures will, the value of the model become more stable and then what will happen to price?
If you can compare phones or pcs, there was a time when each new version was a huge upgrade to the last version, but eventually these ai models are gonna mature and something else is gonna happen
The only thing you need to know about unit economics is this: https://epoch.ai/data-insights/llm-inference-price-trends
TL;DR the prices have gone down (to achieve same benchmarks) by more 9x to 400x.
This should clearly tell you that the margins are high. It would be absurd for OpenAI to be constantly be under a loss when the prices have gone down by ~50x on average. Instead of being ~50x cheaper, couldn't OpenAI be like 45x cheaper and be in profit? What's the difference?
I genuinely don't know why you need any more proof than just this statistic?
Does anyone have a paywall free link to inference costs for models? The article links to an estimate from the information which has a heavy subscription cost and paywalls everything
This totally glosses over the debacle that was GPT-4.5 (which possibly was GPT-5 too, btw), and the claim that it'll ever outcompete humans also totally depends on whether these systems still require human "steering" or work autonomously.
Frankly, it's real slop.
The discussion about Chinese vs US models misses a key enterprise reality: switching costs are enormous once you've built production systems around a specific API. Companies aren't just buying the model - they're buying reliability, compliance guarantees, and the ecosystem that reduces integration risk. Price matters, but in enterprise AI the "last mile" of trust and operational certainty often justifies significant premiums.
OTOH most model APIs are basically identical to each other. You can switch from one to the other using openrouter without even altering the code. Furthermore, they aren't reliable (drop rates can be as high as 20%) and compliance "guarantees" are, AFAIK, completely untested. As anyone used the Copilot compliance guarantees to defend themselves in a copyright infringement suit yet?
I think you are right that trust and operational certainty justifies significant premiums. It would be great if trust and operational certainty were available.
I think AI has the potential to break this model. It reduces switching costs immensely
Responses API is a commodity.
That's why OpenAI tries to push Assistants API, Agents SDK and ChatGPT Apps which are more of a lock in: https://senkorasic.com/articles/openai-product-strategy-2025
Funny thing is, even OpenAI seems to ignore Assistant/Apps API internally. Codex (cli) uses Responses API.