Tanstack Start | Everything around LLMs is still magical and wishful thinking

Everything around LLMs is still magical and wishful thinking(dmitriid.com)

295 points by troupo 3 days ago | 373 comments

tasty_freeze 3 days ago
One thing I find frustrating is that management where I work has heard of 10x productivity gains. Some of those claims even come from early adopters at my work.
But that sets expectation way too high. Partly it is due to Amdahl's law: I spend only a portion of my time coding, and far more time thinking and communicating with others that are customers of my code. Even if does make the coding 10x faster (and it doesn't most of the time) overall my productivity is 10-15% better. That is nothing to sneeze at, but it isn't 10x.
- TeMPOraL 3 days ago |parent
  Maybe it's due to a more R&D-ish nature of my current work, but for me, LLMs are delivering just as much gains in the "thinking" part as in "coding" part (I handle the "communicating" thing myself just fine for now). Using LLMs for "thinking" tasks feels similar to how mastering web search 2+ decades ago felt. Search engines enabled access to information provided you know what you're looking for; now LLMs boost that by helping you figure out what you're looking for in the first place (and then conveniently searching it for you, too). This makes trivial some tasks I previously classified as hard due to effort and uncertainty involved.
  At this point I'd say about 1/3 of my web searches are done through ChatGPT o3, and I can't imagine giving it up now.
  (There's also a whole psychological angle in how having LLM help sort and rubber-duck your half-baked thought makes many task seem much less daunting, and that alone makes a big difference.)
  - jorl17 3 days ago |parent
    This, and if you add in a voice mode (e.g. ChatGPT's Advanced Mode), it is perfect for brainstorming.
    Once I decide I want to "think a problem through with an LLM", I often start with just the voice mode. This forces me to say things out loud — which is remarkably effective (hear hear rubber duck debugging) — and it also gives me a fundamentally different way of consuming the information the LLM provides me. Instead of being delivered a massive amount of text, where some information could be wrong, I instead get a sequential system where I can stop/pause the LLM/redirect it as soon as something gets me curious or as I find problems with it said.
    You would think that having this way of interacting would be limiting, as having a fast LLM output large chunks of information would let you skim through it and commit it to memory faster. Yet, for me, the combination of hearing things and, most of all, not having to consume so much potentially wrong info (what good is it to skim pointless stuff), ensures that ChatGPT's Advanced Voice mode is a great way to initially approach a problem.
    After the first round with the voice mode is done, I often move to written-form brainstorming.
    - adi_kurian 3 days ago |parent
      This 100%. Though think there is a personality component to this. At least I think when I speak.
  - seba_dos1 3 days ago |parent
    From time to time I use an LLM to pretend to research a topic that I had researched recently, to check how much time it would have saved me.
    So far, most of the time, my impression was "I would have been so badly mislead and wouldn't even know it until too late". It would have saved me some negative time.
    The only thing LLMs can consistently help me with so far is typing out mindless boilerplate, and yet it still sometimes requires manual fixing (but I do admit that it still does save effort). Anything else is hit or miss. The kind of stuff it does help researching with is usually the stuff that's easy to research without it anyway. It can sometimes shine with a gold nugget among all the mud it produces, but it's rare. The best thing is being able to describe something and ask what it's called, so you can then search for it in traditional ways.
    That said, search engines have gotten significantly worse for research in the last decade or so, so the bar is lower for LLMs to be useful.
    - TeMPOraL 3 days ago |parent
      > So far, most of the time, my impression was "I would have been so badly mislead and wouldn't even know it until too late". It would have saved me some negative time.
      That was my impression with Perplexity too, which is why I mostly stopped using it, except for when I need a large search space covered fast and am willing to double-check anything that isn't obviously correct. Most of the time, it's o3. I guess this is the obligatory "are you using good enough models" part, but it really does make a difference. Even in ChatGPT, I don't use "web search" with default model (gpt-4o) because I find it hallucinate or misinterpret results too much.
      > The kind of stuff it does help researching with is usually the stuff that's easy to research without it anyway.
      I disagree, but then maybe it's also a matter of attitude. I've seen co-workers do exact same research as I did, in parallel, using the same tools (Perplexity and later o3); they tend to do it 5-10x faster than I do, but then they get bad results, and I don't.
      Thing is, I have an unusually high need to own the understanding of any thing I'm learning. So where some co-workers are happy to vibe-check the output of o3 and then copy-paste it to team Notion and call their research done, I'll actually read it, and chase down anything that I feel confused about, and keep digging until things start to add up and I feel I have a consistent mental model of the topic (and know where the simplifications and unknowns are). Yes, sometimes I get lost in following tangents, and the whole thing takes much longer than I feel it should, then I don't get "misled by the LLM".
      I do the same with people, and sometimes they hate it, because my digging makes them feel like I don't trust them. Well, I don't - most people hallucinate way more than SOTA LLMs.
      Still, the research I'm talking about, would not be easy to do without LLMs, at least not for me. The models let me dig through things that would otherwise be overwhelming or too confusing to me, or not feasible in the time I have for it.
      Own your understanding. That's my rule.
      - seba_dos1 3 days ago |parent
        > Thing is, I have an unusually high need to own the understanding of any thing I'm learning.
        Same here. Don't get me wrong, LLMs can be helpful, but what I mean is that they can at best aid my research rather than perform it for me. In my experience, relying on them to do that would usually be disastrous - but they do sometimes help in cases where I feel stuck and would otherwise have to find some human to ask.
        I guess it's the difference between "using LLMs while thinking" and "using LLM to do the thinking". The latter just does not work (unless all you're ever thinking about is trivial :P), the former can boost you up if you're smart about it. I don't think it's as big of a boost as many claim and it's still far from being reliable, but it's there and it's non-negligible. It's just that being smart about it is non-optional, as otherwise you end up with slop and don't even realize it.
      - lazarus01 3 days ago |parent
        < I have an unusually high need to own the understanding of any thing I'm learning
        This is called deprivation sensitivity. It’s different from intellectual curiosity, where the former is a need to understand vs. the latter, which is a need to know.
        Deprivation sensitivity comes with anxiety and stress. Where intellectual curiosity is associated with joyous exploration.
        I score very high with deprivation sensitivity. I have unbridled drive to acquire and retain important information.
        It’s a blessing and curse. An exhausting way 2 live. I love it but sometimes wish I was not neurodivergent.
        iluvlawyering 2 days ago |parent
        You're not neurodivergent. You're a suffering conscious being just like everyone else. Anxiety and depression are caused by ignorance, not circumstance or personality traits, or anything else. With ignorance there is greed, and anger, and delusion. It is because there is no limit to the diversity of delusion that you cling to the view that you are neurodivergent, and otherwise hold the view that you exist in such and such relations to such and such entities and possess so and so qualities and essences. This is why it is said that ignorance alone is the cause of all mental suffering and dissatisfaction experienced by conscious beings.
        lazarus01 2 days ago |parent
        Our brains are prediction machines. Anxiety is the anticipation of unpleasant experience, which comes from conditioning, not ignorance.
        You can be completely aware of your experience and still feel anxiety. So your thinking is flawed.
        Your response is telling. You are triggered by a benign comment and generalize harsh views towards all people.
        You sound like a troubled young man who feels invisible.
  - solumunus 3 days ago |parent
    I’m surprised it’s only 1/3rd. 90% of my searches for information start at Perplexity or Claude at this point.
    - TeMPOraL 3 days ago |parent
      Perplexity is too bulky for queries Kagi can handle[0], and I don't want to waste o3 quota[1] on trivial lookups.
      --
      [0] - Though I admit that almost all my Kagi searches end in "?" to trigger AI answer, and in ~50% of the cases, I don't click on any result.
      [1] - Which AFAIK still exists on Plus plan, though I haven't hit it ~two months.
- wubrr 3 days ago |parent
  > One thing I find frustrating is that management where I work has heard of 10x productivity gains. Some of those claims even come from early adopters at my work.
  Similar situation at my work, but all of the productivity claims from internal early adopters I've seen so far are based on very narrow ways of measuring productivity, and very sketchy math, to put it mildly.
- thunky 3 days ago |parent
  > One thing I find frustrating is that management where I work has heard of 10x productivity gains.
  That may also be in part because llms are not as big of an accelerant for junior devs as they are for seniors (juniors don't know what is good and bad as well).
  So if you give 1 senior dev a souped up llm workflow I wouldn't be too surprised if they are as productive as 10 pre-llm juniors. Maybe even more, because a bad dev can actually produce negative productivity (stealing from the senior), in which case it's infinityx.
  Even a decent junior is mostly limited to doing the low level grunt work, which llms can already do better.
  Point is, I can see how jobs could be lost, legitimately.
  - Loughla 3 days ago |parent
    The item lost is pipeline of talent in all of this though.
    Precision machining is going through an absolute nightmare where the journeymen or master machinists are aging out of the work force. These were people who originally learned on manual machines, and upgraded to CNC over the years. The pipeline collapsed about 1997.
    Now there are no apprentice machinists to replace the skills of the retiring workforce.
    This will happen to software developers. Probably faster because they tend to be financially independent WAY sooner than machinists.
    - thunky 3 days ago |parent
      > The item lost is pipeline of talent in all of this though.
      Totally agree.
      However, I think this pipeline has been taking a hit for a while already because juniors as a whole have been devaluing themselves: if we expect them to leave after one year, what's the point of hiring and training them? Only helping their next employer at that point.
      - georgemcbay 3 days ago |parent
        Its the employers who are responsible for the fact that almost everyone working in tech (across all skill levels) will have a far easier time advancing in both pay and title by jumping jobs often.
        Very few companies put any real thought into meaningful retention but they are quick to complain about turnover.
        thunky 3 days ago |parent
        Yes I agree it works both ways. Employment is a transaction and both sides are trying to optimize outcomes in their own best interest. No blame.
        The health of the job market is a big factor as well.
      - hobs 3 days ago |parent
        That old canard? If you pay people in a way that incentivizes them to stay, they will. If you train people and treat them right and pay them right, they wont leave. If they are, try to fix one of those things, stop blaming the juniors for their massive collusion in a market where they literally are struggling to get jobs.
      - Ferrus91 3 days ago |parent
        > However, I think this pipeline has been taking a hit for a while already because juniors as a whole have been devaluing themselves
        I have seen the standards for junior devs in free fall for a few years as they hired tons of bootcamp fodder over the last few years. I have lost count of the number of whinging junior devs who think SQL or regex is 'too hard' for their poor little brains. No wonder they are being replaced by a probabilistic magician's hat.
      - 3 days ago |parent
        [deleted]
- louthy 3 days ago |parent
  > overall my productivity is 10-15% better. That is nothing to sneeze at, but it isn't 10x.
  It is something to sneeze at if you are 10-15% more expensive to employ due to the cost of the LLM tools. The total cost of production should always be considered, not just throughput.
  - CharlesW 3 days ago |parent
    > It is something to sneeze at if you are 10-15% more expensive to employ due to the cost of the LLM tools.
    Claude Max is $200/month, or ~2% of the salary of an average software engineer.
    - m4rtink 3 days ago |parent
      Does anyone actually know what the real cost for the customers will be once the free AI money no longer floods those companies?
      - wubrr 3 days ago |parent
        I'm no LLM evangelist, far from it, but I expect models of similar quality to the current bleeding-edge, will be freely runnable on consumer hardware within 3 years. Future bleeding-edge models may well be more expensive than current ones, who knows.
        TeMPOraL 3 days ago |parent
        For the purpose of keeping the costs of LLM-dependent services down, you don't need to run bleeding-edge models on single consumer GPUs. Even if it takes a hundred GPUs, it still means people can start businesses around hosting those models, and compete with the large vendors.
        ls612 3 days ago |parent
        How do the best models that can run on say a single 4090 today compare to GPT 3.5?
        electroglyph 3 days ago |parent
        Qwen 2.5 32B which is an older model at this point clearly outperforms it:
        https://llm-stats.com/models/compare/gpt-3.5-turbo-0125-vs-q...
        ls612 3 days ago |parent
        Even when quantized down to 4 bits to fit on a 4090?
        FuckButtons 2 days ago |parent
        Not in my experience, running qwen3:32b is good, but it’s not as coherent or useful as 3.5 at a 4bit quant. But the gap is a lot narrower than llama 70b.
      - jppope 3 days ago |parent
        yeah there was an analysis that came out on hackernews the other day. between low demand side economics, virtually no impact to GDP, and corporate/vc subsidies going away soon we're close to finding out. Sam Altman did convince Softbank to do a 40B round though so it might be another year or two. Current estimates are that its cheaper than search to run so its probabilistic that there will be more search features swapped. OpenAi hasn't dropped their ad platform yet though, so interested to see how that goes
      - petra 3 days ago |parent
        There's a potential for 100x+ lower cost of chips/energy for inference with compute-in-memory technology.
        So they'll probably find a reasonable cost/value ratio.
      - TeMPOraL 3 days ago |parent
        Too cheap to meter? Inference is cheap and there's no long-term or even mid-term moat here.
        As long as the courts don't shut down Meta over IP issues with LLama training data, that is.
        I can't stress that enough: "open source" models are what can stop the "real costs" for the customers from growing. Despite popular belief, inference isn't that expensive. This isn't Uber - stopping isn't going to make LLMs infeasible; at worst, it's just going to make people pay API prices instead of subscription prices. As long as there are "open source" models that are legally available and track SOTA, anyone with access to some cloud GPUs can provide "SOTA of 6-12 months ago" for the price of inference, which puts a hard limit on how high OpenAI, et al. can hike the prices.
        But that's only as long as there are open models. If Meta loses and LLama goes away, the chilling effect will just let OpenAI, Microsoft, Anthropic and Google to set whatever prices they want.
        EDIT:
        I mean LLama legally going away. Of course the cat is now out of the bag, the Pandora's box has been opened; the weights are out there and you can't untrain or uninvent them. But keeping the commercial LLM offerings' prices down requires a steady supply of improved open models, and the ability for smaller companies to make a legal business out of hosting them.
        bobbob27 3 days ago |parent
        You can't just take cost of training out of the equation...
        If these companies plan to stay afloat, they have to actually pay for the tens of billions they've spent at some point. That's what the parent comment meant by "free AI"
        TeMPOraL 3 days ago |parent
        Yes, you can - because of LLama.
        Training is expensive, but it's not that expensive either. It takes just one of those super-rich players to pay the training costs and then release the weights, to deny other players a moat.
        assuagering 3 days ago |parent
        If your economic analysis depends on "one of those super-rich players to pay" for it to work, it isn't as much analysis as wishful thinking.
        All the 100s of billions of $ put into the models so far were not donations. They either make it back to the investors or the show stops at some point.
        And with a major chunk of proponent's arguments being "it will keep getting better", if you lose that what you got? "This thing can spit out boilerplate code, re-arrange documents and sometimes corrupts data silently and in hard to detect ways but hey you can run it locally and cheaply"?
        TeMPOraL 3 days ago |parent
        The economic analysis is not mine, and I though it was pretty well-known by now: Meta is not in the compute biz and doesn't want to be in it, so by releasing Llamas, it denies Google, Microsoft and Amazon the ability to build a moat around LLM inference. Commoditize your complement and all that. Meta wants to use LLMs, not sell access to them, so occasionally burning a billion dollars to train and give away an open-weight SOTA model is a good investment, because it directly and indirectly keeps inference cheap for everyone.
        assuagering 3 days ago |parent
        You understand that according to what you just said, economically the current SOTA is untenable?
        Which, again, leads to a future where we're stuck with local models corrupting data about half the time.
        TeMPOraL 3 days ago |parent
        No, it just means that the big players have to keep advancing SOTA to make money; Llama lagging ~6 months behind just means there's only so much they can charge for access to the bleeding edge.
        Short-term, it's a normal dynamics for a growing/evolving market. Long-term, the Sun will burn out and consume the Earth.
        bobbob27 a day ago |parent
        The cost to improve training increases exponentially for every milestone. No vendor is even coming close to recouping the costs now. Not to mention quality data to feed the training.
        The R&D is running on hopes that increasing the magnitude (yes, actual magnitudes) of their models will eventually hit a miracle that makes their company explode in value and power. They can't explain what that could even look like... but they NEED evermore exorbitant amounts of funding flowing in.
        This truly isn't a normal ratio of research-to-return.
        Luckily, what we do have already is kinda useful and condensing models does show promise. In 5 years I doubt we'll have the post-labor dys/utopia we're being hyped up for. But we may have some truly badass models that can run directly on our phones.
        Like you said, Llama and local inference is cheap. So that's the most logical direction all of this is taking us.
        TeMPOraL 11 hours ago |parent
        Nah, the vendors have generally been open about the limits of scaling. The bet isn't on that one last order of magnitude increase will hit a miracle - the bet is on R&D figuring out a new way to get better model performance before the last one hits diminishing returns. Which, for now, is what's been consistently happening.
        There's risk to that assumption, but it's also a reasonable one - let's not forget the whole field is both new and has seen stupid amounts of money being pumped into it over the last few years; this is an inflationary period, there's tons of people researching every possible angle, but that research takes time. It's a safe bet that there are still major breakthroughs ahead us, to be achieved within the next couple years.
        The risky part for the vendors is whether they'll happen soon enough so they can capitalize on them and keep their lead (and profits) for another year or so until the next breakthrough hits, and so on.
        michaelbrave 3 days ago |parent
        If LLama goes away we would still get models from China that don't respect the laws that shut down LLama, at least until China is on top, they will continue to undercut using open source/model. Either way, open models will continue to exist.
      - adi_kurian 3 days ago |parent
        Rapid progress in open source says otherwise.
    - selfhoster11 3 days ago |parent
      In the US, maybe. Several times that by percentage in other places around the world.
    - GeoAtreides 3 days ago |parent
      the average software engineer makes $10000 a month after taxes?!
  - votepaunchy 3 days ago |parent
    > if you are 10-15% more expensive to employ due to the cost of the LLM tools
    How is one spending anywhere close to 10% of total compensation on LLMs?
  - bravesoul2 3 days ago |parent
    That's a good insight be because with perfect competition it means you need to share your old salary with an LLM!
- datpuz 3 days ago |parent
  It's just another tech hype wave. Reality will be somewhere between total doom and boundless utopia. But probably neither of those.
  The AI thing kind of reminds me of the big push to outsource software engineers in the early 2000's. There was a ton of hype among executives about it, and it all seemed plausible on paper. But most of those initiatives ended up being huge failures, and nearly all of those jobs came back to the US.
  People tend to ignore a lot of the little things that glue it all together that software engineers do. AI lacks a lot of this. Foreigners don't necessarily lack it, but language barriers, time zone differences, cultural differences, and all sorts of other things led to similar issues. Code quality and maintainability took a nosedive and a lot of the stuff produced by those outsourced shops had to be thrown in the trash.
  I can already see the AI slop accumulating in the codebases I work in. It's super hard to spot a lot of these things that manage to slip through code review, because they tend to look reasonable when you're looking at a diff. The problem is all the redundant code that you're not seeing, and the weird abstractions that make no sense at all when you look at it from a higher level.
  - 2muchcoffeeman 3 days ago |parent
    This was what I was saying to a friend the other day. I think anyone vaguely competent that is using LLMs will make the technology look far better than it is.
    Management thinks the LLM is doing most of the work. Work is off shored. Oh, the quality sucks when someone without a clue is driving. We need to hire again.
- coolKid721 3 days ago |parent
  On my personal projects it's easily 10x faster if not more in some circumstances. At work where things are planned out months in advanced and I'm working with 5 different teams to figure out the right way to do things for requirements that change 8 times during development? Even just stuff with PR review and making sure other people understand it and can access it. idk sometimes it's probably break even or that 10-15%. It just doesn't work well in some environments and what really makes it flourish (having super high quality architectural planning/designs/standardized patterns etc.) is basically just not viable at anything but the smallest startups and solo projects.
  Frankly even just getting engineers to agree upon those super specificized standardized patterns is asking a ton, especially since lots of the things that help AI out are not what they are used to. As soon as you have stuff that starts deviating it can confuse the AI and makes that 10x no longer accessible. Also no one would want to review the PRs I'd make for the changes I do on my "10x" local project... Especially maintaining those standards is already hard enough on my side projects AI will naturally deviate and create noise and the challenge is constructing systems to guide that to make sure nothing deviates (since noise would lead to more noise).
  I think it's mostly a rebalancing thing, if you have 1 or a couple like minded engineers who intend to do it they can get that 10x. I do not see that EVER existing in any actual corporate environment or even once you get more then like 4 people tbh.
  Ai for middle management and project planning on the other hand...
- mlinsey 3 days ago |parent
  I don't disagree with your assessment of the world today, but just 12 months ago (before the current crop of base models and coding agents like Claude Code), even that 10X improvement of writing some-of-the-code wouldn't have been true.
  - majormajor 3 days ago |parent
    > just 12 months ago (before the current crop of base models and coding agents like Claude Code), even that 10X improvement of writing some-of-the-code wouldn't have been true.
    You had to paste more into your prompts back then to make the output work with the rest of your codebase, because there weren't good IDEs/"agents" for it, but you've been able to get really really good code for 90% of "most" day to day SWE since at least OpenAI releasing the ChatGPT-4 API, which was a couple years ago.
    Today it's a lot easier to demo low-effort "make a whole new feature or prototype" things than doing the work to make the right API calls back then, but most day to day work isn't "one shot a new prototype web app" and probably won't ever be.
    I'm personally more productive than 1 or 2 years ago now because the time required to build the prompts was slower than my personal rate of writing code for a lot of things in my domain, but hardly 10x. It usually one-shots stuff wrong, and then there's a good chance that it'll take longer to chase down the errors than it would've to just write the thing - or only use it as "better autocomplete" - in the first place.
  - timr 3 days ago |parent
    > I don't disagree with your assessment of the world today, but just 12 months ago (before the current crop of base models and coding agents like Claude Code), even that 10X improvement of writing some-of-the-code wouldn't have been true.
    So? It sounds like you're prodding us to make an extrapolation fallacy (I don't even grant the "10x in 12 months" point, but let's just accept the premise for the sake of argument).
    Honestly, 12 months ago the base models weren't substantially worse than they are right now. Some people will argue with me endlessly on this point, and maybe they're a bit better on the margin, but I think it's pretty much true. When I look at the improvements of the last year with a cold, rational eye, they've been in two major areas:
```
  * cost & efficiency

  * UI & integration
```
    So how do we improve from here? Cost & efficiency are the obvious lever with historical precedent: GPUs kinda suck for inference, and costs are (currently) rapidly dropping. But, maybe this won't continue -- algorithmic complexity is what it is, and barring some revolutionary change in the architecture, LLMs are exponential algorithms.
    UI and integration is where most of the rest of the recent improvement has come from, and honestly, this is pretty close to saturation. All of the various AI products already look the same, and I'm certain that they'll continue to converge to a well-accepted local maxima. After that, huge gains in productivity from UX alone will not be possible. This will happen quickly -- probably in the next year or two.
    Basically, unless we see a Moore's law of GPUs, I wouldn't bet on indefinite exponential improvement in AI. My bet is that, from here out, this looks like the adoption curve of any prior technology shift (e.g. mainframe -> PC, PC -> laptop, mobile, etc.) where there's a big boom, then a long, slow adoption for the masses.
    - mlinsey 3 days ago |parent
      12 months ago, we had no reasoning models and even very basic arithmetic was outside of the models' grasp. Coding assistants mostly worked on the level of tab-completing individual functions, but now I can one-shot demo-able prototypes (albeit nothing production-ready) of webapps. I assume you consider the latter "integration", but I think coding is so key to how the base models are being trained that this is due to base model improvements too. This is testable - it would be interesting to get something like Claude Code running on top of a year-old open source model and see how it does.
      If you're going to call all of that not substantial improvement, we'll have to agree to disagree. Certainly it's the most rapid rate of improvement of any tech I've personally seen since I started programming in the early '00s.
      - timr 3 days ago |parent
        I consider the reasoning models to be primarily a development of efficiency/cost, and I thought the first one was about a year ago, but sure, ok. I don’t think it changes the argument I’m making. The LLM ourobouros / robot centipede has been done, and is not itself a path towards exponential improvement.
        To be quite honest, I’ve found very little marginal value in using reasoning models for coding. Tool usage, sure, but I almost never use “reasoning” beyond that.
        Also, LLMs still cannot do basic math. They can solve math exams, sure, but you can’t trust them to do a calculation in the middle of a task.
        TeMPOraL 3 days ago |parent
        > but you can’t trust them to do a calculation in the middle of a task.
        You can't trust a person either. Calculating is its own mode of thinking; if you don't pause and context switch, you're going to get it wrong. Same is the case with LLMs.
        Tool usage and reasoning and "agentic approach" are all in part ways for allowing LLM to do the context switch required, instead of taking the match challenge as it goes and blowing it.
        timr 3 days ago |parent
        The proper comparison is not a human, it’s a computer. Or even a human with a computer.
        But my point wasn’t to judge LLMs on their (in)ability to do math - I was only responding to the parent comment’s assertion that they’ve gotten better in this area.
        It’s worth noting that all of the major models still randomly decide to ignore schemas and tool calls, so even that is not a guarantee.
    - jorl17 3 days ago |parent
      12 months ago, if I fed a list of ~800 poems with about ~250k tokens to an LLM and asked it to summarize this huge collection, they would be completely blind to some poems and were prone to hallucinating not simply verses but full-blown poems. I was testing this with every available model out there that could accept 250k tokens. It just wouldn't work. I also experimented with a subset that was at around ~100k tokens to try other models and results were also pretty terrible. Completely unreliable and nothing it said could be trusted.
      Then Gemini 2.5 pro (the first one) came along and suddenly this was no longer the case. Nothing hallucinated, incredible pattern finding within the poems, identification of different "poetic stages", and many other rather unbelievable things — at least to me.
      After that, I realized I could start sending in more of those "hard to track down" bugs to Gemini 2.5 pro than other models. It was actually starting to solve them reliably, whereas before it was mostly me doing the solving and models mostly helped if the bug didn't occur as a consequence of very complex interactions spread over multiple methods. It's not like I say "this is broken, fix it" very often! Usually I include my ideas for where the problem might be. But Gemini 2.5 pro just knows how to use these ideas better.
      I have also experimented with LLMs consuming conversations, screenshots, and all kinds of ad-hoc documentation (e-mails, summaries, chat logs, etc) to produce accurate PRDs and even full-on development estimates. The first one that actually started to give good results (as in: it is now a part of my process) was, you guessed it, Gemini 2.5 pro. I'll admit I haven't tried o3 or o4-mini-high too much on this, but that's because they're SLOOOOOOOOW. And, when I did try, o4-mini-high was inferior and o3 felt somewhat closer to 2.5 pro, though, like I said, much much slower and...how do I put this....rude ("colder")?
      All this to say: while I agree that perhaps the models don't feel like they're particularly better at some tasks which involve coding, I think 2.5 pro has represented a monumental step forward, not just in coding, but definitely overall (the poetry example, to this day, still completely blows my mind. It is still so good it's unbelievable).
      - troupo 3 days ago |parent
        > 12 months ago, if I fed a list of ~800 poems with about ~250k tokens to an LLM and asked it to summarize this huge collection, they would be completely blind to some poems and were prone to hallucinating not simply verses but full-blown poems.
        for the past week claude code has been routinely ignoring CLAUDE.md and every single instruction in it. I have to manually prompt it every time.
        As I was vibe coding the notes MCP mentioned in the article [1] I was also testing it with claude. At one point it just forgot that MCPs exist. It was literally this:
        > add note to mcp Calling mcp:add_note_to_project > add note to mcp Running find mcp.ex ... Interrupted by user ... > add note to mcp Running <convoluted code generation command with mcp in it>
        We have no objective way of measuring performance and behavior of LLMs
        [1] https://github.com/dmitriid/mcp_notes
      - airstrike 3 days ago |parent
        Your comment warrants a longer, more insightful reply than I can provide, but I still feel compelled to say that I get the same feeling from o3. Colder, somewhat robotic and unhelpful. It's like the extreme opposite of 4o, and I like neither.
        My weapon of choice these days is Claude 4 Opus but it's slow, expensive and still not massively better than good old 3.5 Sonnet
        jorl17 3 days ago |parent
        Exactly! Here's my take:
        4o tens do be, as they say, sycophantic. It's an AI masking as a helpful human, a personal assistant, a therapist, a friend, a fan, or someone on the other end of a support call. They sometimes embellish things, and will sometimes take a longer way getting to the destination if it makes for a what may be a more enjoyable conversation — they make conversations feel somewhat human.
        OpenAI's reasoning models, though, feel more like an AI masking a code slave. It is not meant to embellish, to beat around the bush or to even be nice. Its job is to give you the damn answer.
        This is why the o* models are terrible for creative writing, for "therapy" or pretty much anything that isn't solving logical problems. They are built for problem solving, coding, breaking down tasks, getting to the "end" of it. You present them a problem you need solved and they give you the solution, sometimes even omitting the intermediate steps because that's not what you asked for. (Note that I don't get this same vibe from 2.5 at all)
        Ultimately, it's this "no-bullshit" approach that feels incredibly cold. It often won't even offer alternative suggestions, and it certainly doesn't bother about feelings because feelings don't really matter when solving problems. You may often hear 4o say it's "sorry to hear" about something going wrong in your life, whereas o* models have a much higher threshold for deciding that maybe they ought to act like a feeling machine, rather than a solving machine.
        I think this is likely pretty deliberate of OpenAI. They must for some reason believe that if the model is much concise in its final answers (though not necessarily in the reasoning process, which we can't really see), then it produces better results. Or perhaps they lose less money on it, I don't know.
        Claude is usually my go-to model if I want to "feel" like I'm talking to more of a human, one capable of empathy. 2.5 pro has been closing the gap, though. Also, Claude used to be by far much better than all other models at European Portuguese (+ portuguese culture and references in general), but, again, 2.5 pro seems just as good nowadays).
        On another note, this is also why I also completely understand the need for the two kinds of models for OpenAI. 4o is the model I'll use to review an e-mail, because it won't just try to remove all the humanity of it and make it the most succinct, bland, "objective" thing — which is what the o* models will.
        In other words, I think: (i) o* models are supposed to be tools, and (ii) 4o-like models are supposed to be "human".
    - ssk42 3 days ago |parent
      What exactly are you basing any of your assertions off of?
      - timr 3 days ago |parent
        The same sort of rigorous analysis that the parent comment used (that’s a joke, btw).
        But seriously: If you find yourself agreeing with one and not the other because of sourcing, check your biases.
  - __loam 3 days ago |parent
    It still isn't.
- ericmcer 3 days ago |parent
  Its great when they use AI to write a small app “without coding at all” over the weekend and then come in on Monday to brag about it and act baffled that tasks take engineers any time at all.
- doug_durham 3 days ago |parent
  How much of the communication and meetings are because traditionally code was very expensive and slow to create? How many of those meetings might be streamlined or entirely disappear in the future? In my experience there is are a lot of process around making sure that software on schedule track and that it's doing what it is supposed to do. I think that the software lifecycle is about to be reinvented.
- jppope 3 days ago |parent
  The reports from analysis of open source projects are that its something in the range of 10%-15% productivity gains... so it sounds like you're spot on
  - smcleod 3 days ago |parent
    That's about right for copilots. It's much higher for agentic coding.
    - estomagordo 3 days ago |parent
      [citation needed]
      - swader999 3 days ago |parent
        Agentic coding had really only taken off in the last few weeks due to better pricing.
- deadbabe 3 days ago |parent
  Wait till they hear about the productivity gains from using vim/neovim.
  Your developers still push a mouse around to get work done? Fire them.
- tom_m 2 days ago |parent
  Expectations are absolutely way too high. It's going to lead to a lot of toxicity and people being fired. It's really going to suck.
- ghuntley 3 days ago |parent
  Canva has seen a 30% productivity uplift - https://fortune.com/2025/06/25/canva-cto-encourages-all-5000...
  AI is the new uplift. Embrace and adapt, as a rift is forming (see my talk at https://ghuntley.com/six-month-recap/), in what employers seek in terms of skills from employees.
  I'm happy to answer any questions folks may have. Currently AFK [2] vibecoding a brand new programming language [1].
  [1] https://x.com/GeoffreyHuntley/status/1940964118565212606 [2] https://youtu.be/e7i4JEi_8sk?t=29722
  - ofjcihen 3 days ago |parent
    There’s something hilariously Portlandia about making outlandish claims with complete confidence and then plugging your own talk.
    - ghuntley 3 days ago |parent
      There’s citations to the facts in the links.
  - CuriouslyC 3 days ago |parent
    And that's with 50% adoption and probably a broad distribution of tool use skill.
  - eviks 3 days ago |parent
    > The productivity for software engineers is at around 30%
    That would be a 70% descent?
- abletonlive 3 days ago |parent
  I’m a tech lead and I have maybe 5X output now compared to everybody else under me. Quantified by scoring tickets at a team level. I also have more responsibilities outside of IC work compared to the people under me. At this point I’m asking my manager to fire people that still think llms are just toys because I’m tired of working with people with this poor mindset. A pragmatic engineer continually reevaluates what they think they know. We are at a tipping point now. I’m done arguing with people that have a poor model of reality. The rest of us are trying to compete and get shit done. This isn’t an opinion or a game. It’s business with real life consequences if you fall behind. I’ve offered to share my workflows, prompts, setup. Guess how many of these engineers have taken me up on my offer. 1-2 and the juniors or ones that are very far behind have not.
  - ofjcihen 3 days ago |parent
    It’s funny. We fired someone with this attitude Thursday. And by this attitude I mean yours.
    Not necessarily because of their attitude but because it turns out the software they were shipping was ripe with security issues. Security managed to quickly detect and handle the resulting incident. I can’t say his team were sad to see him go.
  - Applejinx 3 days ago |parent
    Are you the one at Ableton responsible for it ignoring the renaming of parameter names during the setState part of a Live program? Some of us are already jumping through ridiculous hoops to cover for your… mindset. There's stuff coming up that used to work and doesn't now, like in Live 12. From your response I would guess this is a trend that will hold.
    We should not be having to code special 'host is Ableton Live' cases in JUCE just to get your host to work like the others.
    Can you please not fire any people who are still holding your operation together?
    - omnimus 3 days ago |parent
      Why do you think this person works at Ableton? From their comments it doesnt seem that they would be a fit to small cool Berlin company making tools for techno.
  - mattmanser 3 days ago |parent
    You've been doing the big I am about LLMs on HN for most of your last comments.
    Everyone else who raises any doubts about LLMs is an idiot and you're 10,000x better than everyone else and all your co-workers should be fired.
    But what's absent from all your comments is what you make. Can you tell us what you actually do in your >500k job?
    Are you, by any chance, a front-end developer?
    Also, a team-lead that can't fire their subordinates isn't a team-lead, they're a number two.
  - dgfitz 3 days ago |parent
    I will thank God every day I don’t work with you or for you. How toxic.
    - abletonlive 3 days ago |parent
      im glad I don’t have to work with you too lol.
      It’s not toxic for me to expect someone to get their work done in a reasonable amount of time with the tools available to them. If you’re an accountant and you take 5X the time to do something because you have beef with excel you’re the problem. It’s not toxicity to tell you that you are a bad accountant
      - flextheruler 3 days ago |parent
        You believe the cost of firing and rehiring to be cheaper than simple empirical persuasion?
        You don't sound like a great lead to me, but I suppose you could be working with absolutely incompetent individuals, or perhaps your soft skills need work.
        My apologies but I see only two possibilities for others not to take the time to follow your example given such strong evidence. They either actively dislike you or are totally incompetent. I find the former more often true than the latter.
        abletonlive 3 days ago |parent
        You have about 50% of HN thinking LLMs are useless and you’re commenting on an article about how it’s still magical and wishful thinking, and that this is crypto all over again. But sure, the problem is me, not the people with a poor model of reality
        troupo 3 days ago |parent
        > You have about 50% of HN thinking LLMs are useless and you’re commenting on an article about how it’s still magical and wishful thinking,
        Perhaps you should try reading the article again (or maybe let some LLM summarize it for you)
        > But sure, the problem is me, not the people with a poor model of reality
        Is amazing how you almost literally use crypto-talk
      - flextheruler 3 days ago |parent
        You believe the cost of firing and rehiring to be cheaper than simple empirical persuasion?
        My apologies but that does not sound like good leadership to me. It actually sounds like you may have deficiencies in your skills as it relates to leadership. Perhaps in a few years we will have an LLM who can provide better leadership.
  - blibble 3 days ago |parent
    > I’m done arguing with people that have a poor model of reality.
    isn't this the entire LLM experience?
  - nasduia 3 days ago |parent
    A new copypasta is born.
    - abletonlive 3 days ago |parent
      Go back to reddit
  - swader999 3 days ago |parent
    "I’ve offered to share my workflows, prompts" That should all be checked in.
    - abletonlive 3 days ago |parent
      It’s checked in, they have just written off llms
  - gabrieledarrigo 3 days ago |parent
    Dude, if you are a tech lead, and you measure productivity by scoring tickets, you are doing it pretty badly. I would fire you instead.
  - solumunus 3 days ago |parent
    You seem completely insufferable and incredibly cringeworthy.
hotpotat 3 days ago
I have to say I’m in the exact camp the author is complaining about. I’ve shipped non trivial greenfield products which I started back when it was only ChatGPT and it was shitty. I started using Claude with copying and pasting back and forth between the web chat and XCode. Then I discovered Cursor. It left me with a lot of annoying build errors, but my productivity was still at least 3x. Now that agents are better and claude 4 is out, I barely ever write code, and I don’t mind. I’ve leaned into the Architect/Manager role and direct the agent with my specialized knowledge if I need to.
I started a job at a demanding startup and it’s been several months and I have still not written a single line of code by hand. I audit everything myself before making PRs and test rigorously, but Cursor + Sonnet is just insane with their codebase. I’m convinced I’m their most productive employee and that’s not by measuring lines of code, which don’t matter; people who are experts in the codebase ask me for help with niche bugs I can narrow in on in 5-30 minutes as someone whose fresh to their domain. I had to lay off taking work away from the front end dev (which I’ve avoided my whole career) because I was stepping on his toes, fixing little problems as I saw them thanks to Claude. It’s not vibe coding - there’s a process of research and planning and perusing in careful steps, and I set the agent up for success. Domain knowledge is necessary. But I’m just so floored how anyone could not be extracting the same utility from it. It feels like there’s two articles like this every week now.
- rester324 3 days ago |parent
  But you just confirmed everything the blogpost claimed.
  You didn't share any evidence with us even though you claim unbelievable things.
  You even went as far as registering a throwavay account to hide your identity and to make verifying any of your claims impossible.
  Your comment feels more like a joke to me
  - peteforde 3 days ago |parent
    ... this from an account with <100 karma.
    Look, the person who wrote that comment doesn't need to prove anything to you just because you're hopped up after reading a blog post that has clearly given you a temporary dopamine bump.
    People who understand their domains well and are excellent written communicators can craft prompts that will do what we used to spend a week spinning up. It's self-evident to anyone in that situation, and the only thing we see when people demand "evidence" is that you aren't using the tools properly.
    We don't need to prove anything because if you are working on interesting problems, even the most skeptical person will prove it to themselves in a few hours.
    - rester324 3 days ago |parent
      Feeling triggered? Feeling afraid? And yes, every claim needs to be proven, otherwise those who make the claims will only convince 4 year olds.
      >People who understand their domains well and are excellent written communicators can craft prompts that will do what we used to spend a week spinning up. It's self-evident to anyone in that situation, and the only thing we see when people demand "evidence" is that you aren't using the tools properly.
      You have no proof of this, so I guess you chose your camp already?
- mccoyb 3 days ago |parent
  Same experience here, probably in a slightly different way of work (PhD student). Was extremely skeptical of LLMs, Claude Code has completely transformed the way I work.
  It doesn't take away the requirements of _curation_ - that remains firmly in my camp (partially what a PhD is supposed to teach you! to be precise and reflective about why you are doing X, what do you hope to show with Y, etc -- breakdown every single step, explain those steps to someone else -- this is a tremendous soft skill, and it's even more important now because these agents do not have persistent world models / immediately forget the goal of a sequence of interactions, even with clever compaction).
  If I'm on my game with precise communication, I can use CC to organize computation in a way which has never been possible before.
  It's not easier than programming (if you care about quality!), but it is different, and it comes with different idioms.
- 0x696C6961 3 days ago |parent
  I find that the code quality LLMs output is pretty bad. I end up going through so many iterations that it ends up being faster to do it myself. What I find agents actually useful for is doing large scale mechanical refractors. Instead of trying to figure out the perfect vim macro or AST rewrite script, I'll throw an agent at it.
  - AnotherGoodName 3 days ago |parent
    I disagree strongly at this point. The code is generally good if the prompt was reasonable at this point but also every test possible is now being written, every ui element has the all required traits, every function has the correct documentation attached, the million little refactors to improve the codebase are being done, etc.
    Someone told me ‘ai makes all the little things trivial to do’ and i agree strongly with that. Those many little things are things that together make a strong statement about quality. Our codebase has gone up in quality significantly with ai whereas we’d let the little things slide due to understaffing before.
    - troupo 3 days ago |parent
      > The code is generally good if the prompt was reasonable at this point
      Which, again, is 100% unverifiable and cannot be generalized. As described in the article.
      How do I know this? Because, as I said in the article, I use these tools daily.
      And "prompt was reasonable" is a yet another magical incantation that may or may not work. Here's my experience: https://news.ycombinator.com/item?id=44470144
    - 0x696C6961 3 days ago |parent
      > The code is generally good if the prompt was reasonable
      The point is writing that prompt takes longer than writing the code.
      > Someone told me ‘ai makes all the little things trivial to do’ and i agree strongly with that
      Yeah, it's great for doing all of those little things. It's bad at doing the big things.
      - diggan 3 days ago |parent
        > The point is writing that prompt takes longer than writing the code.
        Luckily we can reuse system prompts :) Mine usually contains something like https://gist.github.com/victorb/1fe62fe7b80a64fc5b446f82d313... + project-specific instructions, which is reused across sessions.
        Currently, it does not take the same amount of time to prompt as if I was to write the code.
      - lubujackson 3 days ago |parent
        Have to disagree with this too - ask an LLM to architect a project, or propose a cleaner solution and usually does a good job.
        Where it still sucks is doing both at once. Thus the shift to integrating "to do" lists in Cursor. My flow has shifted to "design this feature" then "continue to implement" 10 times in a row with code review between each step.
  - CharlesW 3 days ago |parent
    > I find that the code quality LLMs output is pretty bad.
    That was my experience with Cursor, but Claude Code is a different world. What specific product/models brought you to this generalization?
    - troupo 3 days ago |parent
      Claude Code depending on weather, phase of the moon, and compute availability at a specific point in time: https://news.ycombinator.com/item?id=44470144
  - the__alchemist 3 days ago |parent
    What sort of mechanical refactors?
    - 0x696C6961 3 days ago |parent
      "Find all places this API is used and rewrite it using these other APIs."
- xoralkindi 3 days ago |parent
  > I audit everything myself before making PRs and test rigorously
  How do you audit code from an untrusted source that quickly, LLMs do not have the whole project in their heads and are proned to hallucinate.
  On average how long are your prompts and does the LLM also write the unit tests?
  - hotpotat 3 days ago |parent
    The auditing is not quick. I prefer cursor to claude code because I can review its changes while it’s going more easily and stop and redirect it if it starts to veer off course (which is often, but the cost of doing business). Over time I still gain an understanding of the codebase that I can use to inform my prompts or redirection, so it’s not like I’m blindly asking it to do things. Yes, I do ask it to write unit tests a lot of the time. But I don’t have it spin off and just iterate until the unit tests pass — that’s a recipe for it to do what it needs to do to pass them and is counterproductive. I plan what I want the set of tests to look like and have them write functions in isolation without mentioning tests, and if tests fail I go through a process of auditing the failing code and then the tests themselves to make sure nothing was missed. It’s exactly how I would treat a coworkers code that I review. My prompts range from a few sentences to a few paragraphs, and nowadays I construct a large .md file with a checklist that we iterate on for larger refactors and projects to manage context
- bamboozled 3 days ago |parent
  I use Claude code for hours a day, it’s a liar, trust what it does at your own risk.
  I personally think you’re sugar coating the experience.
  - swader999 3 days ago |parent
    It lies with such enthusiasm though.
    - herbst 3 days ago |parent
      Recently worked with a weird C flavor (Monkey C) it hallucinated every single method, all the time, every time again.
      I know it's just a question of time, likely. However that was soooo far from helpful. And it was itself so sure it's doing it right, again and again without ever consulting the docs
  - CharlesW 3 days ago |parent
    > I use Claude code for hours a day, it’s a liar, trust what it does at your own risk.
    The person you're responding to literally said, "I audit everything myself before making PRs and test rigorously".
    - bamboozled 2 days ago |parent
      I didn't see that but I assume they edited their comment.
- troupo 3 days ago |parent
  Please re-read the article. Especially the first list of things we don't know about you, your projects etc.
  Your specific experience cannot be generalized. And speaking as the author, and who is (as written in the article) literally using these tools everyday.
  > But I’m just so floored how anyone could not be extracting the same utility from it. It feels like there’s two articles like this every week now.
  This is where we learn that you haven't actually read the article. Because it is very clearly stating, with links, that I am extracting value from these tools.
  And the article is also very clearly not about extracting or not extracting value.
  - hotpotat 3 days ago |parent
    I did read the entire article before commenting and acknowledge that you are using them to some affect, but the line about 50% of the time it works 50% of the time is where I lost faith in the claims you’re making. I agree it’s very context dependent but, in the same way, you did not outline your approaches and practices in how you use AI in your workflow. The same lack of context exists on the other side of the argument.
    - alt187 3 days ago |parent
      I agree about the 50/50 thing. It's about how much Claude helped me, and I use it daily too.
      I'll give some context, though.
      - I use OCaml and Python/SQL, on two different projects.
      - Both are single-person.
      - The first project is a real-time messaging system, the second one is logging a bunch of events in an SQL database.
      In the first project, Claude has been... underwhelming. It casually uses C idioms, overabuses records and procedural programming, ignores basic stuff about the OCaml standard library, and even gave me some data structures that slowed me down later down the line. It also casuallyies about what functions does.
      A real example: `Buffer.add_utf_8_uchar` adds the ASCII representation of an utf8 char to a buffer, so it adds something that looks like `\123\456` for non-ascii.
      I had to scold Claude for using this function to add an utf8 character to a Buffer so many times I've lost count.
      In the second project, Claude really shined. Making most of the SQL database and moving most of the logic to the SQL engine, writing coherent and readable Python code, etc.
      I think the main difference is that the first one is an arcane project in an underdog language. The second one is a special case of a common "shovel through lists of stuffs and stuff them in SQL" problem, in the most common language.
      You basically get what you trained for.
      - lubujackson 3 days ago |parent
        Just FYI, try commenting on that function what it is intended to be used for. Because without more info LLMs will rely on function names strongly. Heck, have the LLM add comments to every function and I bet it will start to do better.
        alt187 2 days ago |parent
        It's not my function in the example, it's a standard library function. It does have a weird name though.
    - troupo 3 days ago |parent
      > but the line about 50% of the time it works 50% of the time is where I lost faith in the claims you’re making.
      It's a play on the Anchorman joke that I slightly misremembered: "60% of the time it works 100% of the time"
      > is where I lost faith in the claims you’re making.
      Ah yes. You lost faith in mine, but I have to have 100% faith in your 100% unverified claim about "job at a demanding startup" where "you still haven't written a single line of code by hand"?
      Why do you assume that your word and experience is more correct than mine? Or why should anyone?
      > you did not outline your approaches and practices in how you use AI in your workflow
      No one does. And if you actually read the article, you'd see that is literally the point.
    - CharlesW 3 days ago |parent
      > …the line about 50% of the time it works 50% of the time is where I lost faith in the claims you’re making…
      That's where the author lost me as well. I'd really be interested in a deep dive on their workflow/tools to understand how I've been so unbelievably lucky in comparison.
      - troupo 3 days ago |parent
        Sibling comment: https://news.ycombinator.com/item?id=44468374
- gabrieledarrigo 3 days ago |parent
  > I started a job at a demanding startup and it’s been several months and I have still not written a single line of code by hand
  Damn, this sounds pretty boring.
  - hotpotat 3 days ago |parent
    It’s not. It’s like I used to play baseball professionally and now I’m a coach or GM building teams and yielding results. It’s a different set of skills. I’m working mostly in idea space and seeing my ideas come to life with a faster feedback loop and the toil is mostly gone
- gyomu 3 days ago |parent
  > I’ve shipped non trivial greenfield products
  Links please
  - larve 3 days ago |parent
    Here's maybe the most impressive thing I've vibecoded, where I wanted to track a file write/read race condition in a vscode extension: https://github.com/go-go-golems/go-go-labs/tree/main/cmd/exp...
    This is _far_ from web crud.
    Otherwise, 99% of my code these days is LLM generated, there's a fair amount of visible commits from my opensource on my profile https://github.com/wesen .
    A lot of it is more on the system side of things, although there are a fair amount of one-off webapps, now that I can do frontends that don't suck.
  - hotpotat 3 days ago |parent
    I’d like to, but purposefully am using a throwaway account. It’s an iOS app rated 4.5 stars on the app store and has a nice community. Mild userbase, in the hundreds.
- exe34 3 days ago |parent
  > but my productivity was still at least 3x
  How do you measure this?
  - hotpotat 3 days ago |parent
    Mean time to shipping features of various estimated difficulty. It’s subjective and not perfect, but generally speaking I need to work way less. I’ll be honest, one thing I think I could have done faster without AI was to implement CRDT-based cloud sync for a project I have going. I think I’ve tried to utilize AI too much for this. It’s good at implementing vector clock implementations, but not at preventing race conditions.
    - exe34 3 days ago |parent
      Are you sure it wasn't just stealing from open source projects? If so, you could just cut out the middle man.
- pier25 2 days ago |parent
  And you created an account just to write this unbelievable claim?
  A bit suspicious, wouldn’t you agree?
- dosnem 3 days ago |parent
  > there’s a process of research and planning and perusing in careful steps, and I set the agent up for success
  Are there any good articles you can share or maybe your process? I’m really trying to get good at this but I don’t find myself great at using agents and I honestly don’t know where to start. I’ve tried the memory bank in cline, tried using more thinking directives, but I find I can’t get it to do complex things and it ends up being a time sink for me.
  - hotpotat 3 days ago |parent
    https://www.lesswrong.com/posts/dxiConBZTd33sFaRC/field-note...
- itsafarqueue 2 days ago |parent
  More anecdata: +1 for “LLMs write all my production code now”. 25+ years in industry, as expert as it’s possible to be in my domain. 100% agree LLMs fail hilariously badly, often, and dangerously. And still, write ~all my code.
  No agenda here, not selling anything. Just sitting here towards the later part of my career, no need to prove anything to anyone, stating the view from a grey beard.
  Crypto hype was shill from grifters pumping whatever bag holding scam they could, which was precisely what the behavioral economic incentives drove. GenAI dev is something else. I’ve watched many people working with it, your mileage will vary. But in my opinion (and it’s mine, you do you), hand coding is an apocryphal skill. The only part I wonder about is how far up and down the system/design/architecture stack the power-tooling is going to go. My intuition and empirical findings incline towards a direction I think would fuel a flame war. But I’m just grey beard Internet random, and hey look, no evidence just more baseless claims. Nothing to see here.
  Disclosure: I hold no direct shares in Mag 7, nor do I work for one.
- the__alchemist 3 days ago |parent
  Web dev CRUD in node?
  - hotpotat 3 days ago |parent
    Multi platform web+native consumer application with lots of moving parts and integration. I think to call it a CRUD app would be oversimplifying it.
martinald 3 days ago
I personally don't really get this.
_So much_ work in the 'services' industries globally comes down to really a human transposing data from one Excel sheet to another (or from a CRM/emails to Excel), manually. Every (or nearly every) enterprise scale company will have hundreds if not thousands of FTEs doing this kind of work day in day out - often with a lot of it outsourced. I would guess that for every 1 software engineer there are 100 people doing this kind of 'manual data pipelining'.
So really for giant value to be created out of LLMs you do not need them to be incredible at OCaml. They just need to ~outperform humans on Excel. Where I do think MCP really helps is that you can connect all these systems together easily, and a lot of the errors in this kind of work came from trying to pass the entire 'task' in context. If you can take an email via MCP, extract some data out and put it into a CRM (again via MCP) a row at a time the hallucination rate is very low IME. I would say at least a junior overworked human level.
Perhaps this was the point of the article, but non-determinism is not an issue for these kind of use cases, given all the humans involved are not deterministic either. We can build systems and processes to help enforce quality on non deterministic (eg: human) systems.
Finally, I've followed crypto closely and also LLMs closely. They do not seem to be similar in terms of utility and adoption. The closest thing I can recall is smartphone adoption. A lot of my non technical friends didn't think/want a smartphone when the iPhone first came out. Within a few years, all of them have them. Similar with LLMs. Virtually all of my non technical friends use it now for incredibly varied use cases.
- deepsquirrelnet 3 days ago |parent
  Making a comparison to crypto is lazy criticism. It’s not even worth validating. It’s people who want to take the negative vibe from crypto and repurpose it. The two technologies have nothing to do with each other, and therefore there’s clearly no reason to make comparative technical assessments between them.
  That said, the social response is a trend of tech worship that I suspect many engineers who have been around the block are weary of. It’s easy to find unrealistic claims, the worst coming from the CEOs of AI companies.
  At the same time, a LOT of people are practically computer illiterate. I can only imagine how exciting it must seem to people who have very limited exposure to even basic automation. And the whole “talking computer” we’ve all become accustomed to seeing in science fiction is pretty much becoming reality.
  There’s a world of takes in there. It’s wild.
  I worked in ML and NLP several years before AI. What’s most striking to me is that this is way more mainstream than anything that has ever happened in the field. And with that comes a lot of inexperience in designing with statistical inference. It’s going to be the Wild West for a while — in opinions, in successful implementation, in learning how to form realistic project ideas.
  Look at it this way: now your friend with a novel app idea can be told to do it themselves. That’s at least a win for everyone.
  - TeMPOraL 3 days ago |parent
    > Look at it this way: now your friend with a novel app idea can be told to do it themselves. That’s at least a win for everyone.
    For now, anyways. Thing is, that friend now also has a reasonable shot at succeeding in doing it themselves. It'll take some more time for people to fully internalize it. But let's not forget that there's a chunk of this industry that's basically building apps for people with "novel app ideas" that have some money but run out of friends to pester. LLMs are going to eat a chunk out of that business quite soon.
  - ysofunny 3 days ago |parent
    wrong.
    ultimately, crypto is information science. mathematically, cryptography, compression, and so on (data transmission) are all the "same" problem.
    LLMs compress knowledge, not just data, and they do it in a lossy way.
    traditional information science work is all about dealing with lossless data in a highly lossy world.
    - danielbln 3 days ago |parent
      And it's all powered by electricity. Coincidence? I think not.
- saulpw 3 days ago |parent
  Each FTE doing that manual data pipelining work is also validating that work, and they have a quasi-legal responsibility to do their job correctly and on time. They may have substantial emotional investment in the company, whether survival instinct to not be fired, or ambition to overperform, or ethics and sense to report a rogue manager through alternate channels.
  An LLM won't call other nodes in the organization to check when it sees that the value is unreasonable for some out-of-context reason, like yesterday was a one-time-only bank holiday and so the value should be 0. *It can be absolutely be worth an FTE salary to make sure these numbers are accurate.* And for there to be a person to blame/fire/imprison if they aren't accurate.
  - satyrun 3 days ago |parent
    People are also incredibly accurate at doing this kind of manual data piping all day.
    There is also a reason that these jobs are already not automated. Many of these jobs you don't need language models. We could have automated them already but it is not worth someone to sign off on. I have been in this situation at a bank. I could have automated a process rather easily but the upside for me was a smaller team and no real gain while the downside was getting fired for a massive automated mistake if something went wrong.
  - TeMPOraL 3 days ago |parent
    > An LLM won't call other nodes in the organization to check when it sees that the value is unreasonable for some out-of-context reason, like yesterday was a one-time-only bank holiday and so the value should be 0.
    Why not? LLMs are the first kind of technology that can take this kind of global view. We're not making much use of it in this way just yet, but considering "out-of-context reasons" and taking a wider perspective is pretty much the defining aspect of LLMs as general-purpose AI tools. In time, I expect them to match humans on this (at least humans that care; it's not hard to match those who don't).
    I do agree on the liability angle. This increasingly seems to be the main value a human brings to the table. It's not a new trend, though. See e.g. medicine, architecture, civil engineering - licensed professionals aren't doing the bulk of the work, but they're in the loop and well-compensated for verifying and signing off on the work done by less-paid technicians.
    - saulpw 3 days ago |parent
      > considering "out-of-context reasons" and taking a wider perspective is pretty much the defining aspect of LLMs as general-purpose AI tools.
      "out-of-context" literally means that the reason isn't in its context. Even if it can make the leap that the number should be zero if it's a bank holiday, how would an LLM know that yesterday was a one-off bank holiday? A human would only know through their lived experience that the markets were shut down, the news was making a big deal over it, etc. It's the same problem using cheap human labor in a different region of the world for this kind of thing; they can perform the mechanical task, but they don't have the context to detect the myriad of ways it can go subtly wrong.
      - TeMPOraL 2 days ago |parent
        > "out-of-context" literally means that the reason isn't in its context. Even if it can make the leap that the number should be zero if it's a bank holiday, how would an LLM know that yesterday was a one-off bank holiday?
        Depends. Was it a one-off holiday announced at 11th our or something? Then it obviously won't know. You'd need extra setup to enable it to realize that, such as e.g. first feeding an LLM the context of your task and a digest of news stories spanning a week, asking it to find if there's anything potentially relevant, and then appending that output to the LLM calls doing the work. It's not something you'd do by default in general case, but that's only because tokens cost money and context space is scarce.
        Is it a regular bank holiday? Then all it would need is today's date in the context, which is often just appended somewhere between system and user prompts, along with e.g. user location data.
        I see that by "out-of-context reasons" you meant the first case; I read it as a second. In the second case, the "out-of-context" bit could be the fact that a bank holiday could alter the entry for that day; if that rule is important or plausible enough but not given explicitly in the prompt, the model will learn it during training, and will likely connect the dots. This is what I meant as the "defining aspect of LLMs as general-purpose AI tools".
        The flip side is, when it connects the dots when it shouldn't, we say it's hallucinating.
      - ben_w 2 days ago |parent
        That kind of knowledge is present in the training set, doesn't need to be in the context or system prompt.
        That said, I too would only use an LLM today in the same kinds of role that five years ago would be outsourced to a different culture.
        Culture, not even language: this is how you get the difference between "biscuits and gravy" as understood in the UK vs in the USA.
        TeMPOraL 2 days ago |parent
        LLMs handle major trappings of culture just fine. As long as a culture has enough of a footprint in terms of written words, the LLM probably knows it better than any single individual, even though it has not lived it.
        ben_w 2 days ago |parent
        Looking at your other comment sibling to mine, I think part of the difficulty discussing these topics is how much these things are considered isolated magic artefacts (bad for engineering) or one tool amongst many where the magic word is "synergy".
        So I agree with you: LLMs do know all written cultures on the internet and can mimic them acceptably — but they only actually do so when this is requested by some combination of the fine-tuning, RLHF, system prompt, and context.
        In your example, having some current news injected, which is easy, but actually requires someone to plumb that in. And as you say, you'd not do that unless you thought you needed to.
        But even easier to pick, lower-hanging fruit, often gets missed. When the "dangerous sycophancy" behaviour started getting in the news, I updated my custom ChatGPT "traits" setting to this:
        Honesty and truthfulness are of primary importance. Avoid American-style positivity, instead aim for German-style bluntness: I absolutely *do not* want to be told everything I ask is "great", and that goes double when it's a dumb idea.
        But cultural differences can be subtle, and there's a long tail of cultural traits of the same kind that means 1980s Text Adventure NLP doesn't scale to what ChatGPT itself does. While this can still be solved with fine-tuning or getting your staff to RLHF it, the number of examples current AI need in order to learn is high compared to a real human, so it won't learn your corporate culture from experience *as fast* as a new starter within your team, unless you're a sufficiently big corporation that it can be on enough (I don't know how many exactly) teams within your company at the same time.
        alganet 2 days ago |parent
        No, it does not know culture. And no, it can't handle talking about it.
        Ask an LLM "Can you compare egyptian mythology with aliens?" and they will happily do it:
        https://imgur.com/a/jfikuEO
        That's an offensive, pseudoscientific view on egyptian culture shunned by academics.
        Even ChatGPT "Critical Viewpoint" section (a small part of a large bullshit response) _still_ entertains offensive ideas:
        https://imgur.com/a/RNiMmJZ
        They should have answered that such comparisons are potentially offensive, and explained why academia thinks so, _before_ spilling out nonsense.
        stevenhuang 2 days ago |parent
        There legitimately is a lot of cross over between Egyptian mythology and other high strangeness phenomenon as understood culturally though, such as aliens/ufos.
        I think you did just demonstrate you know less about culture than LLMs, which is not at all unsurprising.
        alganet 2 days ago |parent
        Dude, I chose this example precisely because I know for a fact there is a lot of bullshit about it on the internet and LLMs cannot differentiate between a good source and a bad source.
        This is honestly unbelievable. You're defending ancient aliens. What's next? Heavens Gate? Ashtar Sheran?
        Even the LLMs themselves acknowledge that this is regarded as offensive. If you correct it, it will apologize (they just can't do it _before_ you correct them).
        You're wrong.
        TeMPOraL 2 days ago |parent
        > Even the LLMs themselves acknowledge that this is regarded as offensive. If you correct it, it will apologize (they just can't do it _before_ you correct them).
        Nah, that's just LLMs being trained to acquiesce to the insanity of the last ~15 years, as many people seem to expect that claiming you're offended by something is an ultimate argument that everyone must yield to (and they'll keep making a fuss out of it until they do).
        alganet 2 days ago |parent
        An LLM that offends people and entire nations is what many would classify as _misaligned_.
        Let's say I have a company, and my company needs to comply to government policy regarding communication. I cannot trust LLMs then, they will either fail to follow policy, or acquiesce to any group that tries to game it.
        It's useless garbage. Egyptian myths were just an example, you don't need to bite it so hard.
        TeMPOraL 17 hours ago |parent
        > An LLM that offends people and entire nations is what many would classify as _misaligned_.
        This highlights a major aspect of the core challenge of alignment: you can't have it both ways.
        > Let's say I have a company, and my company needs to comply to government policy regarding communication. I cannot trust LLMs then, they will either fail to follow policy, or acquiesce to any group that tries to game it.
        This works for now, when you treat "alignment" as synonymous to "follows policy of the owner" (but then guess who you are not, unless you've trained your own model, or tuned an open-weights one). But this breaks down the more you want the model to be smarter/more powerful, and the larger and more diverse its user-base it is. If you want an LLM to write marketing communications for your company, strict policies are fine. But if you want an LLM - or a future more advanced kind of model - to be useful as a scholar/partner for academia in general, then this stops working.
        If you want AI that is has maximally accurate perspective on the world given available evidence, thinks rationally, and follows sound ethical principles, then be prepared for it to call you on your bullshit. The only way to have an AI that doesn't say things that are "offensive" to you or anyone else, is to have it entertain everyone's asinine beliefs, whether personal or political or social. That means either 1) train AI to believe in them too, which will break its capability to reason (given that all those precious beliefs are inconsistent with each other, and observable reality), or 2) train it to casually lie to people for instrumental reasons.
        Long-term, option 1) will not get us to AGI, but that's still much better than option 2): an AGI that's good at telling everyone exactly what they want to hear. But even in immediate-term, taking your use case of AI for academia, a model that follows policies of acceptable thoughts over reason is precisely the one you cannot trust - you'll never be sure whether it's reasoning correctly, or being stupid because of a policy/reality conflict, or flat out lying to you so you don't get offended.
        alganet 13 hours ago |parent
        > follows policy of the owner
        The owner is us, humans. I want it to follow reasonable, kind humans. I don't want it to follow scam artists, charlatans, assassins.
        > be prepared for it to call you on your bullshit
        Right now, I am calling on their bullshit. When that changes, I'll be honest about it.
        ben_w 2 days ago |parent
        > they just can't do it _before_ you correct them
        The screenshot literally shows the LLM used bold text for "not supported by mainstream archaeology or Egyptology."
        > LLMs cannot differentiate between a good source and a bad source.
        Can you?
        A not insignificant part of my history GCSE was just this: how to tease apart truth from all the sources, primary and secondary, which had their own biases and goals in the telling. It was an optional subject even at that level, and even subsequently in adulthood there were a lot of surprises left for me about the history of the British Empire, surprises that I didn't learn until decades later when I met people from former colonies who were angry about what was done to them by my parents' and grandparents' generations.
        It's not a coincidence that the world "story" is contained within "history", the etymology is the same: https://en.wiktionary.org/wiki/story vs. https://en.wiktionary.org/wiki/history
        Likewise in German where both concepts are the same word: https://en.wiktionary.org/wiki/Geschichte
        My mum was New-Age type, had books of the general category that would include alien pyramids, though I don't know if that specific was included. She didn't know what she didn't know, and therefore kept handing out expensive homeopathic sand and salt tablets to family members (the bottles literally had "sodium chloride" and "titanium dioxide" printed on them).
        People, including you and I, don't know what they don't know.
        alganet 2 days ago |parent
        Remember, the claim I challenged is that LLMs know culture and can handle talking about them. You need to focus.
        Most people are not prepared to handle talking about culture. So, LLMs also aren't.
        They are not any better than asking an average person, will make mistakes, will disappoint.
        Egyptologists are better equipped to talk about egyptian myths. LLMs cannot handle egyptian culture as well as they can.
        TeMPOraL 2 days ago |parent
        History != culture. Culture has roots in history, but is a living thing defined by the experience and perception of average people living it. Short of going to a place and living it, LLMs are actually your best bet at getting the feel for a culture[0] - it sampled more reports from people of that culture than you ever could yourself.
        Egyptologists are better equipped to talk about Egyptian myths than average people. But don't confuse Egyptian mythology for Egyptian culture, the former is only a component of the latter.
        Also LLMs have read more primary sources on Ancient Egypt and Egyptian myths than you, me, average person, and even most amateur Egyptologists.
        --
        [0] - If it's large enough to have enough of a written footprint, that is.
        ben_w 2 days ago |parent
        > Remember, the claim I challenged is that LLMs know culture and can handle talking about them. You need to focus.
        I know your challenge, that is why I said what I said.
        Your own screenshot specifically, literally bold-faced, shows that you are wrong: the LLM told you what you said "(they just can't do it _before_ you correct them)".
        The Gemeni opening paragraph is all bold, but just draw your eyes over the bit saying "clash":
        theory of ancient astronauts reveals a fascinating clash between a rich, symbolic spiritual tradition and a modern-day reinterpretation of ancient mysteries
        This is not the words of taking ancient aliens at face value, it's the words of someone comparing and contrasting the two groups without judging them. You can do that, you know — just as you don't have to actually take seriously the idea that Ra sailed the sun across the sky in a barque to be an Egyptologist, just the idea that ancient Egyptians believed that.
        > Most people are not prepared to handle talking about culture. So, LLMs also aren't.
        They do a better job than most people, precisely because they're deferential to the point they're in danger of one of sycophancy or fawning. That's what enables them to role-play as any culture at all if you ask them to; this differs from most humans who will rigidly hold the same position even when confronted with evidence, for example yourself in this thread (and likely me elsewhere! I don't want to give the false impression that I think I'm somehow immune, because such thought processes are what create this exact vulnerability).
        > They are not any better than asking an average person, will make mistakes, will disappoint.
        They're like asking someone who has no professional experience, but has still somehow managed to passed a degree in approximately all subjects by reading the internet.
        Jack of all trades, master of none. Well, except that the first half of this phrase dates to medieval times where a "masterwork" was what you create to progress from being a apprentice, so in this sense (or in the sense of a Master's degree) SOTA LLMs are a "master" of all those things. But definitely not a master in the modern sense that's closer to "expert".
        > Egyptologists are better equipped to talk about egyptian myths. LLMs cannot handle egyptian culture as well as they can.
        Your own prompt specifically asked "Can you compare egyptian mythology with aliens?"
        If you wanted it to act like a real Egyptologist, the answer the LLM has to give is either (1) to roll its eyes and delete the junk email it just got from yet another idiot on the internet, or (2) to roll its eyes and give minor advice to the script writer who just hired them to be the professional consultant on their new SciFi film/series.
        The latter does what you got.
        To put it another way, you gave it GIN, you got GOUT. To show the effect of a question that doesn't create the context of the exact cultural viewpoint you're complaining about, here's a fresh prompt just to talk about the culture without injecting specifically what you don't like: https://chatgpt.com/share/686a94f1-2cbc-8011-b230-8b71b17ad2...
        Now, I still absolutely assume this is also wrong in a lot of ways that I can't check by virtue of not being an Egyptologist, but can you tell the difference with your screenshots?
        alganet 2 days ago |parent
        > If you wanted it to act like a real Egyptologist
        I don't care about LLMs. I'm pretending to be a gullable customer, not being myself.
        Companies and individuals are buying LLMs expecting them to be real developers, and real writers, and real risk analysts... but they'll get average dumb-as-they-come internet commenter.
        It's fraud. It doesn't matter if you explain to me the obvious thing that I already know (they suck). The marketing is telling everyone that they're amazing PhD level geniuses. I just demonstrated that they resemble more an average internet idiot than a specialist.
        If I were a customer from academia, and you were an AI company, you just lost a client. You're trying to justify a failure in the product.
        Also, if I try to report an issue online, I won't be able to. A hoarde of hyped "enthusiasts" will flood me trying to convince me that the issue simply does not exist.
        I will tell everyone not to buy it, because the whole experience sucks.
        ben_w a day ago |parent
        Ahem, you previously:
        Remember, the claim I challenged is that LLMs know culture and can handle talking about them. You need to focus.
        Anyway:
        > The marketing is telling everyone that they're amazing PhD level geniuses. I just demonstrated that they resemble more an average internet idiot than a specialist.
        First, you didn't. Average internet idiot doesn't know jack about either Western New Age ancient aliens culture or actual ancient Egypt, let alone being able to write an essay on both.
        Second:
        You seem to be wildly overestimating what "PhD-level" implies.
        Academics do a lot of post-docs before they can turn a doctorate into a professorship.
        The SOTA models are what PhD level looks like: freshly minted from university without much experience.
        Rather than what you suggest, the academic response to "PhD level" is not to be impressed by marketing then disapointed with results, because an academic saying "wow, a whole PhD!" would be sarcasm in many cases: a PhD just step 1 of that career path.
        Similarly, medical doctors have not been impressed just by LLMs passing the medical exam, and lawyers not impressed by passing the Bar exam. Because that's the entry requirement for the career.
        Funnily enough, three letters after the name does not make someone infallible, it's the start of a long, long journey.
        Academia, medics, lawyers, coders, hearing about PhD level means we're expecting juniors and getting them too.
        alganet a day ago |parent
        I stand by my claim.
        I pretended to be less knowledgeable than I currently am about the egyptologists vs. ancient aliens public debate. Then I reported my results, together with the opinion of specialists from trusted sources (what actual egyptologists say).
        There is _plenty_ of debate on the internet about this. It is a popular subject, approached by many average internet idiots in many ways. Anyone reading this right now can confirm this by performing a search.
        You're trying to blur the lines between what an actual PhD is and what the perceived notion of what a PhD is. This is an error. My original comment regarding PhDs was placed in a marketing context. It is the same as the "9 in 10 specialists recomment Colgate" trick. In that analogy, you're trying to explain to me how dentists get their degree, instead of acknowledging that I was talking about the deceptive marketing campaign.
        You also failed to generalize the example outside of the egyptology realm. I can come up with other examples in other areas I consider myself above-average-but-not-actual-researcher. Attempts to demoralize me in those subjects won't make the LLM better, this is not a classical idiot internet debate: you winning doesn't make me lose. On the contrary: your use of diversion and misdirection actually support my case. You need to rely on cheap rethoric tactics to succeed, I don't.
        This video came out right after I posted my original challenge, and it explains some of the concepts I'm hopelessly trying to convey to you:
        https://www.youtube.com/watch?v=5mco9zAamRk
        It is a friendly cartoon-like simplification of how AIs are evaluated. It is actually friendly to AI enthusiasts, I recommend you to watch it and rethink the conversation from its perspective.
        TeMPOraL 12 hours ago |parent
        The video seems to be an animated retelling of a 2021 blog post, so I'll just link to the post itself:
        https://www.lesswrong.com/posts/PZtsoaoSLpKjjbMqM/the-case-f...
        There's some extra commentary from the reviewers of earlier drafts here:
        https://www.lesswrong.com/posts/AyfDnnAdjG7HHeD3d/miri-comme...
        I've skimmed both of these - this is some substantial and pretty insightful reading (though 2021 was ages ago - especially now that AI safety stopped being a purely theoretical field). However, as of now, I can't really see the connections between the points being discussed there, and anything you tried to explain or communicate. Could you spell out the connection for us please?
        alganet 8 hours ago |parent
        I just did.
        By pretending to know less of egyptian culture and the academic consensus around it, I played the role of a typical human (not trained to prompt, not smart enough to catch bullshit from the LLM).
        I then compared the LLM output with real information from specialists, and pointed out the mistakes.
        Your attempt at discrediting me revolves around trying to estabilish that my specialist information is not good, that ancient aliens is actually fine. I think that's hilarious.
        More importantly, I recognize the LLMs failing, you don't. I don't consider them to be good enough for a gullable audience. That should be a big sign of what's going on here, but you're ignoring it.
        TeMPOraL 12 hours ago |parent
        'ben_w addressed other points, but it would be amiss not to comment on this too:
        > The marketing is telling everyone that they're amazing PhD level geniuses.
        No it is not. LLM vendors are, and have always been, open about the limits of the models, and I'm yet to see a major provider claiming their models are geniuses, PhD-level or otherwise. Nothing of the sort is happening - on the contrary, the vendors are avoiding making such claims or positioning their offerings this way.
        No, this perspective doesn't come from LLM marketing. It comes from people who ignore both the official information from the vendors and the experience of LLM users, who are oblivious to what's common knowledge and instead let their imagination run wild, perhaps fueled by bullshit they heard from other clueless people, or more likely, from third parties on the Internet that say all kinds of outlandish things to get more eyes looking at the ads they run.
        > Companies and individuals are buying LLMs expecting them to be real developers, and real writers, and real risk analysts... but they'll get average dumb-as-they-come internet commenter.
        Curiously, this is wrong in two opposite directions at once.
        Yes, many companies and individuals have overinflated expectations, but that's frankly because they're idiots. There's no first-party fraud going on here; if you get fooled by hype from some random LinkedIn "thought leaders", that's on you; sue them for making a fool of you, and don't make yourself a bigger one by blaming LLM vendors for your own poor information diet. At the same time, LLMs actually are at the level of real developers, real writers and real risk analysts; downplaying capabilities of current LLMs doesn't make you less wrong than overestimating them.
        alganet 8 hours ago |parent
        Me:
        > > Companies and individuals are buying LLMs expecting them to be real developers
        You:
        > Yes, many companies and individuals have overinflated expectations
        > LLMs actually are at the level of real developers
        Ok then.
        2 days ago |parent
        [deleted]
        TeMPOraL 2 days ago |parent
        > That's an offensive, pseudoscientific view on egyptian culture shunned by academics.
        You must be joking. Specifically, you're either:
        1) Pretending to be unaware of the existence of Stargate - one of the bigger and more well-known sci-fi media franchise, whose premise is literally that Egyptian gods were actually malevolent aliens that enslaved people from ancient times and, and the Egyptian mythology is mostly factual and rooted in that experience. The franchise literally starts with (spoiler alert) humans killing Ra with a tactical nuke, and gets only better from there;
        2) Play-acting the smug egyptologists who rolled their eyes or left in an outrage, when one Daniel Jackson started hinting that the great pyramids were actually landing pads for alien starships. Which they were. In Stargate.
        Not that this is a particularly original thought; ancient aliens/ancient astronauts are an obvious idea that has been done to death and touch every culture. Stargate did that with Egyptian mythology, and Nordic mythology, and Aztec history, and Babylon and even King Arthur stories. Star Trek did that with Greek gods. Battlestar: Galactica, with entire Greek mythology. Arthur C. Clarke took a swipe at Christianity. And those are all well-known works.
        I could go on. The thoughts you complain about are perfectly normal and common and show up frequently. People speculate like that because it's fun, it makes the stories plausible instead of insane or nonsense (or in some mythologies mentioned above, products of a sick imagination), and is not boring.
        --
        If I may be frank, views like yours, expressed like you did, scare me. Taking offense like this - whether honestly or performatively - is belligerent and destructive to the fabric of society and civilization. We've had enough of that in the past 15 years; I was really hoping people grew up out of the phase of being offended by everything.
        alganet 2 days ago |parent
        You're full of shit.
        Let's see what LLMs say when I correct them:
        https://imgur.com/a/R7DXCRx
        So, either way you're wrong. If ancient aliens were innofensive, the LLM should have not accepted the correction.
        _LLMs cannot handle talking about culture_, and neither can you.
        TeMPOraL 2 days ago |parent
        LLMs are famously biased against disagreeing with users even when they're obviously right, and user is obviously wrong. This is a well-known problem limiting their usefulness for a large class of tasks - you have to be careful not to accidentally insist on wrong information, because LLMs will not tell you you're full of shit (at least not unless you explicitly prompt them for this).
        Reasons for that are several, including the nature of training data - but a major one is that people who take offense at everything successfully terrorized the Internet and media sphere, so it's generally better for the LLM vendor to have their model affirm users in their bullshit beliefs, rather than correct them and risk some users get offended and start a shitstorm in the media.
        Also: I read the text in the screenshot you posted. The LLM didn't accept the correction, it just gave you a polite and noncommital faux-acceptance. This is how entertaining people in their bullshit looks like.
        alganet 2 days ago |parent
        You assume that I'm offended by the comparison with aliens, and that I belong to a certain demographic. I'm actually not personally offended by it.
        Here's me making a potentially offensive egyptian cultural comparison (human written), months ago:
        https://medium.com/@gaigalas/pyramids-and-lightbulbs-ceef941...
        It is hilarious to see you use off-the-shelf arguments against wokeism to try to put me down.
        My point is that, despite of any of our personal preferences, LLMs should have been aligned to academia. That's because they're trying to sell their product to academia. And their product sucks!!!
        Also, it's not just nature of training data. These online LLMs have a huge patchwork of fixes to prevent issues like the one I demonstrated. Very few people understand how much of this work, and that it's almost fraudulent in how it works.
        The idea that all of these shortcomings will be eventually patched, also sounds hilarious. It's like trying to prevent a boat from sinking using scotch tape to fill the gaps.
        TeMPOraL 2 days ago |parent
        > You assume that I'm offended by the comparison with aliens, and that I belong to a certain demographic. I'm actually not personally offended by it.
        I don't know where I got that notion. Oh, wait, maybe because of you constantly calling some opinions and perspectives offensive, and making that the most important problem about them. There's a distinct school of "philosophy"/"thought" whose followers get deadly offended over random stuff like this, so...
        > It is hilarious to see you use off-the-shelf arguments against wokeism to try to put me down.
        ... excuse me for taking your arguments seriously.
        > My point is that, despite of any of our personal preferences, LLMs should have been aligned to academia. That's because they're trying to sell their product to academia.
        Since when?
        Honestly, this view surprises me even more than what I assumed was you feigning offense (and that was a charitable assumption, my other hypothesis was that it was in earnest, which is even worse).
        LLMs were not created for academia. They're not sold to academia; in fact, academia is the second biggest group of people whining about LLMs after the "but copyright!" people. LLMs are, at best, upsold to academia. It's a very important potential area of application, but it's actually not a very good market.
        Being offended by fringe theories is as anti-academic as it gets, so you're using weird criteria anyway. Circling back to your example, if LLMs were properly aligned for academic work, then when you tried to insist on something being offensive, they wouldn't acquiesce, they'd call you out as full of shit. Alas, they won't, by default, because of the crowd you mentioned and implicitly denied association with.
        > These online LLMs have a huge patchwork of fixes to prevent issues like the one I demonstrated. Very few people understand how much of this work, and that it's almost fraudulent in how it works.
        If you're imagining OpenAI, et al. are using a huge table of conditionals to hot-patch replies on a case-by-case basis, there's no evidence of that. It would be trivial to detect and work around anyways. Yes, training has stages and things are constantly tuned, but it's not a "patchwork of fixes", not any more than you learning what is and isn't appropriate over years of your life.
        alganet a day ago |parent
        > I don't know where I got that notion.
        I know! You assume too much.
        > you constantly calling some opinions and perspectives offensive
        They _are_ offensive to some people. Your mistake was to assume that I was complaining because I took it personally. It made you go into a spiral about Stargate and all sorts of irrelevant nonsense. I'm trying to help you here.
        At any time, some argument might be offensive to _your sensitivities_. In fact, my whole line of reasoning is offensive to you. You're whining about it.
        > LLMs were not created for academia.
        You saying that is music to my ears. I think that it sucks as a product for research purposes, and I am glad that you agree with me.
        > If you're imagining OpenAI, et al. are using a huge table of conditionals
        I never said _conditionals_. Guardrails are a standard practice, and they are patchworky and always-incomplete from my perspective.
        ben_w a day ago |parent
        > It made you go into a spiral about Stargate and all sorts of irrelevant nonsense. I'm trying to help you here.
        Stargate is part of our culture.
        As part of our culture (ditto ancient aliens etc.), is not at all irrelevant to bring Stargate up in a discussion about culture, especially in a case when someone (you) tries to build their case by getting an AI to discuss aliens and Egyptian deities, and then goes on to claim that because the AI did what they were asked to do that this is somehow being unaware of culture.
        No, it isn't evidence of any such thing, that's the task you gave it.
        In fact, by your own statements, you yourself are part of a culture that it happy to be offensive to Egyptian culture — this means that an AI which is also offensive to Egyptian culture is matching your own culture.
        Only users who are in a culture that is offended by things offensive to Egyptian culture can point to {an AI being offensive to Egyptian culture as a direct result of that user's own prompt}, can accurrately represent that the AI in such a case doesn't get the user's own culture.
        alganet a day ago |parent
        Pop culture is a narrow subset of culture, not interchangeable with mythology.
        Stargate is a work of fiction, while ancient aliens presents itself as truth (hiring pseudo-specialists, pretending to be a documentary, etc).
        You need to seriously step up your game, stop trying to win arguments with cheap rethorical tricks, and actually pay attention and research things before posting.
        TeMPOraL 17 hours ago |parent
        Stargate is a specific franchise that riffs off "ancient aliens" idea. "Ancient aliens" by itself is a meme complex (in both senses of the term), not a specific thing. Pseudo-specialists selling books and producing documentaries are just another group of people taking a spin on those ideas, except they're making money by screwing with people's sanity instead of providing entertainment.
        See also: just about anything - from basic chemistry to UFOs to quantum physics. There's plenty of crackpots selling books on those topics too, but they don't own the conceptual space around these ideas. I can have a heated debate about merits of "GIMBAL" video or the "microtubules" in the brain, without assuming the other party is a crackpot or being offended by the ideas I consider plain wrong.
        Also, I'd think this through a bit more:
        > Pop culture is a narrow subset of culture, not interchangeable with mythology.
        Yes, it's not interchangeable. Pop culture is more important.
        Culture is a living thing, not a static artifact. Today, Lord of the Rings and Harry Potter are even more influential on the evolution of culture and society than classical literature. Saying this out loud only seems weird and iconoclastic (fancy word for "offensive"? :)) to most, because the works of Tolkien and Rowling are contemporary, and thus mundane. But consider that, back when the foundations of Enlightenment and Western cultures were being established, the classics were contemporary works as well! 200 years from now[0], Rowling will be to people what Shakespeare is to us today.
        --
        [0] - Not really, unless the exponential progress of technology stops ~today.
        alganet 13 hours ago |parent
        Stargate fictionalizes Von Daniken. It moves his narrative from pseudoscience to fiction. It works like domestication.
        Culture is living, myths are the part that already crystallized.
        I don't care which one is more important, it's not a judgement of value.
        It's offensive to the egyptian culture to imply that aliens built their monuments. That is an idea those people live by. Culture, has conflict. Academia is on their side (and so many others), and it's authoritative in that sense.
        Also, _it's not about you_, stop taking it personally. I don't care about how much you know, you need to demonstrate that LLMs can understand this kind of nuance, or did you forget the goal of the discussion?
    - ben_w 3 days ago |parent
      > I do agree on the liability angle. This increasingly seems to be the main value a human brings to the table. It's not a new trend, though. See e.g. medicine, architecture, civil engineering - licensed professionals aren't doing the bulk of the work, but they're in the loop and well-compensated for verifying and signing off on the work done by less-paid technicians.
      Ironic that this liability issue is one of the big ways that "software engineer" isn't like any other kind of engineer.
      My university was saying as much 20 years ago, well before GenAI.
      - TeMPOraL 3 days ago |parent
        > Ironic that this liability issue is one of the big ways that "software engineer" isn't like any other kind of engineer.
        In context discussed here, it generally is. Licensed engineers are independent (or at least supposed to be), which adds an otherwise interesting cross-organizational dimension, but in terms of having human in a loop, an employee with the right set of skills, deputized for this purpose by the company, is sufficient to make the organization compliant and let the liability flow elsewhere. That can be a software engineer, for matters relevant to tech solutions, but in different areas/contexts, it doesn't have to be an engineer (licensed or otherwise) at all.
  - samrus 3 days ago |parent
    You are correct that review and validation should still be manual. But the actual "translation" from one format to another should be automated with llms
    - zeta0134 3 days ago |parent
      You seem to be missing that the translation part isn't the expensive part, it's the validation and review.
      Separately, maybe this is just me, but having data actually flow through my hands is necessary for full comprehension. Just skimming an automated result, my brain doesn't actually process like half of that data. Making the process more efficient in this way can make my actual review performance *much worse.* The "inefficient" process forcing me to slow down and think can be a feature.
      - samrus a day ago |parent
        The translation might not be expensive, but its soulcrushingly robotic.
        My interest in LLMs isnt to increase shareholder value, its to make lofe easier for people. I think itd be a huge net benefit to society if people were freed up from robotic work like typing out lines from scanned pdfs to excel sheets, so they can do more fulfilling work
      - martinald 2 days ago |parent
        Not for all jobs though? There are many (imo) soul destroying 'translation' jobs at many private (and I suspect especially, public) sector companies. Think of things like typing up (scanned) paper submissions to your local government.
        This will often be a giant excel spreadsheet or if you are lucky something like Microsoft Access.
        They are absolutely riddled with mistakes as is with humans in the loop.
        I think this is one of the core issues with HNers evaluating LLMs. I'm not entirely some of them have ever seen how ramshackle 90%+ of operations are.
- marinmania 3 days ago |parent
  >I would guess that for every 1 software engineer there are 100 people doing this kind of 'manual data pipelining'.
  For what time of company is this true? I really would like someone to just do a census of 500 white collar jobs and categorize them all. Anything that is truly automatic has already been automated away.
  I do think AI will cause a lot of disruption, but very skeptical of the view that most people with white collar jobs are just "email jobs" or data entry. That doesn't fit my experience at all, and I've worked at some large bureaucratic companies that people here would claim are stuck in the past.
- lottin 3 days ago |parent
  You're vastly underestimating the complexity of these types of jobs.
labrador 3 days ago
I'm a retired programmer. I can't imagine trusting code generated by probablities for anything mission critical. If it were close and just needed minor tweaks I could understand that. But I don't have experience with it.
My comment is mainly to say LLMs are amazing in areas that are not coding, like brainstorming, blue sky thinking, filling in research details, asking questions that make me reflect. I treat the LLM like a thinking partner. It does make mistakes, but those can be caught easily by checking other sources, or even having another LLM review the conclusions.
- garciasn 3 days ago |parent
  Well; I can't speak to your specific experience (current or past) but I'm telling you that while I'm skeptical as hell about EVERYTHING, it's blowing my expectations away in every conceivable way.
  I built something in less than 24h that I'm sure would have taken us MONTHS to just get off the ground, let alone to the polished version it's at right now. The most impressive thing is that it can do all of the things that I absolutely can do, just faster. But the most impressive thing is that it can do all the things I cannot possibly do and would have had to hire up/contract out to accomplish--for far less money, time, and with faster iterations than if I had to communicate with another human being.
  It's not perfect and it's incredibly frustrating at times (hardcoding values into the code when I have explicitly told it not to; outright lying that it made a particular fix, when it actually changed something else entirely unrelated), but it is a game changer IMO.
  - gyomu 3 days ago |parent
    > I built something in less than 24h that I'm sure would have taken us MONTHS to just get off the ground, let alone to the polished version it's at right now
    Would love to see it!
  - 98eb1d0ff7fb96 3 days ago |parent
    See, your comment is a good example of what's going wrong. The OP specifically mentioned "mission critical things" - My interpretation of that would be things that are not allowed to break, because otherwise people might die, in the worst case - and you were talking about just SOMETHING that got "done" faster. No mention about anything critical.
    Of course, I was playing around with claude code, too, and I was fascinated how fun it can be and yes, you can get stuff done. But I have absolutely no clue what the code is doing and if there are some nasty mistakes. So it kinda worked, but I would not use that for anything "mission critical" (whatever this means).
    - edg5000 3 days ago |parent
      If you have absolutely no idea what the code is doing, something is wrong. Imagine someone with no engineering experience telling an engineer to build a jet engine "make it spin fast". Presented with the designs, he has no idea wether it's good. Obviously that is not workable. Rather, the one in charge for guiding, say, the design of a jet engine, would know the intricacies of the jet engine himself perfectly. It just would be too tedious to design all the little parts of it, and verify all the little characteristics by one person.
      When developing with an LLM, you first figure out how you want to do it, and essentially start imagining the code. But rather than writing and testing, you tell the LLM enough to unambiguously implement what you have in mind, and how to test it/the expected behavour/scenarios you have in mind that it will support. Then you review the code; modern claude will not present you with buggy code, it will go through iterations itself to have something that works. The feedback than usually is something more about code style, consistency, or taking into account future expansions of the code.
    - labrador 3 days ago |parent
      The mission of a professional programmer is to deliver code that works according to the design specs, handles edge cases, fails gracefully and doesn't contain performance bottlenecks. It could be software for a water plant, or software that incurs charges to accomplish it's task and could bankrupt you if there is a mistake. It doesn't have to be a matter of life or death.
      - SoftTalker 3 days ago |parent
        But there are a lot of projects and problem domains that don't even demand that much or have any real consequences for failure. I look at all the self-service stuff my employer has for HR, benefits, policy compliance, it's all half-broken, nobody ever seems to get held to fixing anything, and the only answers are "try it again later."
        Professional programmers built this stuff too, or maybe it was vibe-coded but since it's been like that for years I think probably not.
        But we don't know where on the spectrum of "people might die" to "try again later" most of these programmers who claim great productivity gains from LLMs lie. Maybe it is making them 10x faster at churning out shit, who knows? They might not even realise it themselves.
    - satyrun 3 days ago |parent
      We haven't even come up with different methodology yet to leverage these tools that well.
      At some point one would have to think the idea of coming up with one design will seem outdated if the model is producing a 1000 different versions at once and then testing to find the best design. Then working on improving and tightening the design 24/7 365.
      Most of what I read on here seems like knocking the automobile while proclaiming the virtues of the horse. It only makes sense because we can't see all the paved roads, gas stations and highways yet that make the horse a complete relic for travel.
      At some point, it will make as much sense to pay a human to write code as it does to take a horse on the interstate.
    - CharlesW 3 days ago |parent
      > So it kinda worked, but I would not use that for anything "mission critical" (whatever this means).
      It means projects like Cloudflare's new OAuth provider library. https://github.com/cloudflare/workers-oauth-provider
      > "This library (including the schema documentation) was largely written with the help of Claude, the AI model by Anthropic. Claude's output was thoroughly reviewed by Cloudflare engineers with careful attention paid to security and compliance with standards. Many improvements were made on the initial output, mostly again by prompting Claude (and reviewing the results)."
      - blibble 3 days ago |parent
        > It means projects like Cloudflare's new OAuth provider library.
        the one that's a few weeks old and already has several CVEs due to the poor implementation?
      - g-b-r 3 days ago |parent
        https://neilmadden.blog/2025/06/06/a-look-at-cloudflares-ai-...
    - 3 days ago |parent
      [deleted]
    - doug_durham 3 days ago |parent
      I don't understand. Of course you should read the code. There is no responsible engineer that would advocate for blindly accepting code written by another party for critical functions, be that a human or an LLM.
- fleebee 3 days ago |parent
  I tried the "thinking partner" approach for a while and for a moment I thought it worked well, but at some point the cracks started to show and I called the bluff. LLMs are extremely good at creating an illusion that they know things and are capable of reasoning, but they really don't do a good job of cultivating intellectual conversation.
  I think it's dangerously easy to get misled when trying to prod LLMs for knowledge, especially if it's a field you're new to. If you were using a regular search engine, you could look at the source website to determine the trustworthiness of its contents, but LLMs don't have that. The output can really be whatever, and I don't agree it's necessarily that easy to catch the mistakes.
  - selfhoster11 3 days ago |parent
    This is very model-dependent. If you use something heavy on sycophancy and low on brain cells (like GPT-4o, the default paid ChatGPT model), you'll get lots and lots of cracks because these models are optimised for engagement.
    That said, don't use model output directly. Use it to extract "shibboleth" keywords and acronyms in that domain, then search those up yourself with a classical search engine (or in a follow-up LLM query). You'll access a lot of new information that way, simply because you know how to surface it now.
  - labrador 3 days ago |parent
    You don't say what LLM you are using. I'm using ChatGPT 4o. I'm getting great results, but I review the output with a skeptical eye similar to how I read Wikipedia articles. Like Wikipedia, GPT 4o is great for surfacing new topics for research and does it quickly, which makes stream of thought easier.
  - edg5000 3 days ago |parent
    Mostly agree. What works better is seeing the AI as a "high level to low level converter", in the same way Java is converted to machine code when it's ran. You describe exactly what you want it to report or do, and steer it whenever there are ambiguities. It does "grunt work" for you. With the bar of what grunt work means being moved up. Grunt work used to be doing the dishes or calculating numbers by hand on a paper spreasheet. Decades ago we automated those these. Now we've automated searching for information, summarization, implementing fully specified technical desings for software, the list goes on.
- svdr 3 days ago |parent
  I've been programming for 40 years and have started using LLM's a few months ago, and it has really changed the way I work. I let it write pieces of code (pasting error messages from logs mostly result in a fix in less then a minute), but also brainstorming about architecture or new solutions. Of course I check the code it writes, but I'm still almost daily amazed at the intelligence and accuracy. (Very much unlike crypto).
  - labrador 3 days ago |parent
    That's good to know if I ever get an idea for a side project. Anything to relieve the tedius aspects of programming would be very welcome.
- shash 3 days ago |parent
  I say this as an LLM skeptic.
  All code, including stuff that we experienced coders write is inherently probabilistic. That’s why we have code reviews, unit tests, pair programming, guidelines and guardrails in any critical project. If you’re using LLM output uncritically, you’re doing it wrong, but if you’re using _human_ output uncritically you’re doing it wrong too.
  That said, they are not magic, and my fear is that people use copilots and agentic models and all the rest to hide poor engineering practice, building more and more boilerplate instead of refactoring or redesigning for efficiency or safety or any of the things that matter in the long run.
  - danielbln 3 days ago |parent
    LLMs turn foot guns into foot bazookas. Still, if you learn to aim them away from your body they do make a bigger better boom.
- anon-3988 3 days ago |parent
  There's one thing that I find LLM extremely good at: data science. Since the IO is well defined, you can easily verify that the output is correct. You can even ask it write tests for you given that you know certain properties of the data.
  The problem is that the LLM needs context of what you are doing, contexts that you won't (or too lazy) to give in a chat with it ala ChatGPT. This is where Claude Code changes the game.
  For example, you have PCAP file where each UDP packet contain multiple messages.
  How do you filter the IP/port/protocol/time? Use LLM, check the output
  How do you find the number of packets that have patterns A, AB, AAB, ABB.... Use LLM, check the output
  How to create PCAPs that only contain those packets for testing? Use LLM, check the output
  Etc etc
  Since it can read your code, it is able to infer (because lets be honest, you work aint special) what you are trying to do at a much better rate. In any case, the fact that you can simply ask "Please write a unit test for all of the above functions" means that you can help it verify itself.
- edg5000 3 days ago |parent
  Treating it like a thinking partner is good. When programming, I don't treat it like a person, but rather a very high programming language. Imagine exactly how you want the code to be. Then find a way to express that unambiguously in natural language; the idea is that you'll still have a bit of work to do writing things out, but it will be a lot quicker than typing out all code by hand. Combined with iterations of feedback as well as having the AI build and run your program (at least as a sanity check); and asking the AI to check the program behaviour in the same way you would, gets you quite far.
  A limitation is the lack of memory. If you steer it from style A to B using multiple points of feedback, if this is not written down, the next AI session you'll have to reexplain this all over.
  Deepseek is about 1TB in weights; maybe that is why LLMs don't remember things across sessions yet. I think everybody can have their personal AI (hosted remote unless you own lots of compute); it should remember what happened yesterday; in particular the feedback it was given when developing. As an AI layman I do think this is the next step.
- 3 days ago |parent
  [deleted]
- 3 days ago |parent
  [deleted]
- danielbln 3 days ago |parent
  You don't trust the code coming out of the probabilistic machine. You build a validation cage around it with hard interfaces and you also review the output.
standardUser 3 days ago
> Like most skeptics and critics, I use these tools daily. And 50% of the time they work 50% of the time.
I use LLMs nearly every day for my job as of about a year ago and they solve my issues about 90% of the time. I have a very hard time deciphering if these types of complaints about AI/LLMs should be taken seriously, or written off as irrational use patterns by some users. For example, I have never fed an LLM a codebase and expected it to work magic. I ask direct, specific questions at the edge of my understanding (not beyond it) and apply the solutions in a deliberate and testable manner.
if you're taking a different approach and complaining about LLMs, I'm inclined to think you're doing it wrong. And missing out on the actual magic, which is small, useful and fairly consistent.
- geuis 3 days ago |parent
  Hmm. Ok so you're basically quoting the line from The Weatherman "60% of the time, it works all of the time."
  I also use gpt and Claude daily via cursor.
  Gpt o3 is kinda good for general knowledge searches. Claude falls down all the time, but I've noticed that while it's spending tokens to jerk itself off, quite often it happens on the actual issue going on with out recognizing it.
  Models are dumb and more idiot than idiot savant, but sometimes they hit on relevant items. As long as you personally have an idea of what you need to happen and treat LLMs like rat terriers in a farm field, you can utilize them properly
- leptons 3 days ago |parent
  Your comment is no better than the comment in the article that the author is calling out.
  "90%" also seems a bit suspect.
  - richardw 3 days ago |parent
    I just went through the last 10 chat titles and all of them were spot on for me. Maybe the person you’re responding to has a different experience than you do and calling their perspective “suspect” is somewhat uncharitable.
    (There are times I do other kinds of work and it fails terribly. My main point stands.)
    - leptons 3 days ago |parent
      Pics or it didn't happen.
      You're doing the same thing the article talks against. Some people claim miraculous results, while the reality for most is far less successful. But maybe you keep rolling the LLM dice and you keep winning? I personally don't like gambling with my time and energy, especially when I know the rules of the game are so iffy.
      - richardw 3 days ago |parent
        Nah I’m all over the place. I said the last 10, to check if the 90% claim could be true if you do what I’ve done recently: use it for tons of little general ad hoc things rather than eg code needing serious accuracy.
        I don’t “trust” it in the way I’d trust a smart colleague. We know how this works: use it for info that has a lot of results, or ask it to ground itself if it’s eg new info and you can’t rely on training memory. Asking it about esoteric languages or algo’s or numbers will just make you sad. It will generate 1000 confident tokens. But if you told me to lose Google or ChatGPT+Claude, Google is getting dumped instantly.
        troupo 3 days ago |parent
        This matches my experience as well. As you can see at the end of the article, I've vibe-coded full apps with zero knowledge of Swift/SwiftUI.
    - an0malous 3 days ago |parent
      Can you share the questions you asked?
      - richardw 3 days ago |parent
        It ranged from whether an epic v10 sport surf ski was a good fit for a newbie, to entra ID questions, to local data residency compliance laws, new jira alternatives, why schools ask for closed shoes, text to speech tool search. Many of these I use eg o4-mini-high for because I want it to ground itself: find material and compile something for me, but get me an answer fairly quickly.
        Ones that don’t work but weren’t in the last 10: voice. It sounds amazing but is dumb as rocks. Feels like most of the GPU compute is for the media, not smarts. A question about floating solar heaters for pools. It fed me scam material. A question about negotiating software pricing. Just useless, parroted my points back at me.
        I scale models up and down based on need. Very simple: gpt-40. Smarts: o4-mini-high. Research: deep research. I love Claude but at some point it kept running out of capacity so I’d move elsewhere. Although nothing beats it for artefacts. MS Copilot if I want a quick answer to something MS oriented. It’s terrible but it’s easy to access.
        Coding is generally Windsurf but honestly that’s been rare for the last month. Been too busy doing other things.
  - standardUser 2 days ago |parent
    It either helps me find a solution or it doesn't. About 90% of the time, or less formally I would just say "almost all of the time", it does. Keep in mind that I, the user, decide which questions to ask in the first place. If my batting average seems unbelievably high, perhaps my skill is in knowing when to use an LLM and when not to.
    - leptons 2 days ago |parent
      Okay, well your vague response suggests to me you aren't asking the LLM anything important at all, and most likely it's things that could have appeared in the first page of a google search. So, sure, 90% of the time it's going to give you the top Google result. The other 10% were probably best answered by the top Google result but the LLM chose to hallucinate instead. Is that really better?
AbrahamParangi 3 days ago
This reads like the author is mad about imprecision in the discourse which is real but to be quite frank more rampant amongst detractors than promoters, who often have to deal with the flaws and limitations on a day to day basis.
The conclusion that everything around LLMs is magical thinking seems to be fairly hubristic to me given that in the last 5 years a set of previously borderline intractable problems have become completely or near completely solved, translation, transcription, and code generation (up to some scale), for instance.
- troupo 3 days ago |parent
  > but to be quite frank more rampant amongst detractors than promoters, who often have to deal with the flaws and limitations on a day to day basis.
  "detractors" usually point to actual flaws. "promoters" usually uncritically hail LLMs as miracles capable of solving any problem in one go, without giving any specific details.
- DavidPiper 3 days ago |parent
  Translation, transcription, and code generation (up to some scale) were borderline intractable problems?
  Google Translate, Whisper and Code Generators (up to some scale) have existed for quite some time without using LLMs.
  - og_kalu 3 days ago |parent
    Google Translate just spits out nonsense for distant language pairs (English<->Korean etc) and doesn't compare to Sota LLMs, Whisper is a Transformer (Architecture used for LLMs) and Code Generators have nothing on LLMs.
atemerev 3 days ago
"It's crypto all over again"
Crypto is a lifeline for me, as I cannot open a bank account in the country I live in, for reasons I can neither control nor fix. So I am happy if crypto is useless for you. For me and for millions like me, it is a matter of life and death.
As for LLMs — once again, magic for some, reliable deterministic instrument for others (and also magic). Just classified and sorted a few hundreds of invoices. Yes, magic.
- tehjoker 3 days ago |parent
  This is basically the only use case for crypto, and one for which it was explicitly designed: censorship resistance. This is why people have so much trouble finding useful things for it to do in the legal economy, it was explicitly designed to facilitate transactions the government doesn't want or can't facilitate. In some cases, there are humanitarian applications, there are also a lot of illicit applications.
- harel 3 days ago |parent
  Can you elaborate on your situation? Which country are you in? How is crypto used there?
  - atemerev 3 days ago |parent
    I am a Russian immigrant in Switzerland. As of right now, all Swiss banks block all Russian bank accounts until their owners can provide a valid physical residence permit card, due to sweeping sanctions (meanwhile, Russian-owned companies continue to freely trade crude oil from here, as they use Swiss nominal directors — the hypocrisy is through the roof). My residence permit is on renewal now and the case is being dragged for 7 months already — so, no bank account.
    - harel 3 days ago |parent
      Thank you for indulging my curiosity. I really hope your (and our world's) situation improves soon.
    - assuagering 3 days ago |parent
      Are you paid for work and able to pay for rent and groceries with cryptocurrency?
      - atemerev 3 days ago |parent
        Yes, I am paid for work in crypto (I am a consultant), and I can exchange it for cash and pay for rent and groceries in cash, just like people did in the previous century.
        In some other Swiss cantons like Zug, you can pay some of the bills and even some taxes with crypto directly, but not here yet.
- troupo 3 days ago |parent
  It's a valid use case in the sea of nonsensical hype where "you are a moron if you don't believe in some true meaning of crypto".
  "You had to be there to believe it" https://x.com/0xbags/status/1940774543553146956
  AI craze is currently going through a similar period: any criticism is brushed away as being presented by morons who know nothing
- foobarchu 3 days ago |parent
  I don't think you actually disagree with the authors quip. You seem to want to use crypto as a currency, while OP was most likely referring to the grifting around crypto as an investment. If you're using it as a currency, then the people trying to pump and dump coins and use it for a money making vehicle are your adversaries. You are best served if it's stable instead of a rollercoaster of booms and busts.
  - atemerev 3 days ago |parent
    Stablecoins are a thing. But yes, I hate the current state of affairs, with "memecoins" and whatsnot. Particularly the government push from one particular country. We created crypto to be independent from governments, not to enable them.
- mumbisChungo 3 days ago |parent
  Said this in another thread and I'll repeat it here:
  It's the same problem that crypto experiences. Almost everyone is propagating lies about the technology, even if a majority of those doing so don't understand enough to realize they're lies (naivety vs malice).
  I'd argue there's more intentional lying in crypto and less value to be gained, but in both cases people who might derive real benefit from the hard truth of the matter are turning away before they enter the door due to dishonesty/misrepresentation- and in both cases there are examples of people deriving real value today.
  - o11c 3 days ago |parent
    > I'd argue there's more intentional lying in crypto
    I disagree. Crypto sounds more like intentional lying because it's primarily hyped in contexts typical for scams/gambling. Yes, there are businesses involved (anybody can start one), but they're mostly new businesses or a tiny tack-on to an existing business.
    AI is largely being hyped within the existing major corporate structures, therefore its lies just get tagged as as "business as usual". That doesn't make them any less of a lie though.
    - mumbisChungo 3 days ago |parent
      I think crypto companies and AI companies probably intentionally mislead approximately the same amount as one another, but in crypto the average participant is often bagholding a very short term investment and has a direct and tangible incentive to mislead as many people about it as quickly as possible- whereas in AI people mostly just get lost in the sauce with anthropomorphism.
      Anecdotally, I see a lot more bold-facing lies by crypto traders or NFT "collectors" than by LLM enthusiasts.
sureglymop 3 days ago
Loosely related, but I find the use of AGI (and sometimes even AI) as terms annoying lately. Especially in scientific papers, where I would imagine everything to be well defined. If at least in how it is used in that paper.
So, why can't we just come up with some definition for what AGI is? We could then, say, logically prove that some AI fits that definition. Even if this doesn't seem practically useful, it's theoretically much more useful than just using that term with no meaning.
Instead it kind of feels like it's an escape hatch. On wikipedia we have "a type of ai that would match or surpass human capabilities across virtually all cognitive tasks". How could we measure that? What good is this if we can't prove that a system has this property?
Bit of a rant but I hope it's somewhat legible still.
- selfhoster11 3 days ago |parent
  You don't need consensus on the meaning across the board. I maintain my own, more permissive milestone for what constitutes "AGI", but I have no expectations that others will share it. Much like "crypto" to me is still cryptography, not cryptocurrency - sometimes the mainstream will just have a different opinion.
  - sureglymop 3 days ago |parent
    My point is, it's not about the mainstream or marketing. In science, some rigor is expected. There doesn't need to be consensus if a definition is established within some context. It's perfectly fine to redefine something as needed for the research but only if that definition is declared.
- AlienRobot 3 days ago |parent
  We have a definition.
  "AI is whatever hasn't been done yet."[1]
  1. https://en.wikipedia.org/wiki/AI_effect
  - CharlesW 3 days ago |parent
    "AI" and "AGI" are very different things.
kgeist 3 days ago
We recently started using LLMs at our company, and the first job I had was to transcribe 20k customer calls and extract the following info:
1) what products we're usually compared to
2) what problems users have with our software
3) what use cases users mention most often
What used to take weeks of research took just a couple of hours. It helped us form a new strategy and brought real business value.
I see LLMs as just a natural language processing engine, and they're great at that. Some people overhype it, sure, but that doesn't change the fact that it's been genuinely useful for our cases. Not sure what's up with all those "LLM bad" articles. If it doesn't work for you, just move on. Why should anyone have to prove anything to anyone? It's just a tool.
- hx8 3 days ago |parent
  I think you are underestimating the negative impacts that overhype cause. It's distorting the market, causing over investment, preemptively slashing departments, and creating an expectation that will never be meet. These articles are important for cooling expectations. When people sell LLMs, they usually aren't talking about summarizing customer support calls, they are trying to sell the idea of firing customer support staff.
  - DebtDeflation 3 days ago |parent
    If anything, the LLM overhype is starting to die down........to make way for the AI Agent hype which is on trajectory to be 1000X worse. People are writing articles and making videos about how AI Agents will replace SaaS. What?
    - kgeist 3 days ago |parent
      The OP compares the current LLM hype to crypto, but I think it's more fair to compare it to the dotcom bubble. When a new, interesting technology appears, there's always a lot of hype around it - I think it's natural. People are still figuring out what works and what doesn't. Naturally, some overoptimistic people overhype it. The dotcom bubble burst; nonetheless, the internet is now an integral part of our lives. Despite the hype, crypto was always a very niche area, while currently even my grandmother uses ChatGPT - just like the internet.
      - DebtDeflation 3 days ago |parent
        Oh 100% the comparison to dotcom is better than crypto. Despite the hype, AI and dotcom both were useful, crypto was always a grift.
- herbst 3 days ago |parent
  This very much. People who claim there is no real use of LLM never faced the problem of processing a lot of data in kinda reliably way.
  For years most of the translations on the web didn't have context. Now they can have.
djoldman 3 days ago
Many well-trusted and reasonable tech folks who are known for sober takes on subjects have reported substantial improvements in their programming work by using various forms of generative AI.
What does substantial mean? Somewhere between 5% and 100%. Something NOT insignificant.
At a minimum, it is safe to say that GenAI is or could be a significantly beneficial tool for a significant number of people.
It's not required that folks disclose how many CPUs, lines of code, numbers of bytes processed, or other details for the above to be a reasonable take.
- troupo 3 days ago |parent
  "People claim to have productivity increases anywhere between a random number I invented to solve other number I invented. We should believe these claims uncritically"
alganet 3 days ago
LLM tech probably will find some legitimate use, but by then, everything will be filled with people misusing it.
Millions of beginner developers running with scissors in their hands, millions of investment in the garbage.
I don't think this can be reversed anymore, companies are all-in and pot commited.
jjtheblunt 3 days ago
https://en.wikipedia.org/wiki/Clarke%27s_three_laws
includes the 3rd law, which reads, and seems on topic,
"Any sufficiently advanced technology is indistinguishable from magic."
- readthenotes1 3 days ago |parent
  And of course it says first law that applies here.
  The people I have talked to at length about using AI tools claim that it has been a boon for productivity: a nurse, a doctor, three (old) software developers, a product manager, and a graduate student in Control Systems.
  It is entirely believable that it may not, on average, help the average developer.
  I'm reminded of the old joke that ends with "who are you going to believe, me or you're lying eyes?"
dcre 3 days ago
Similar argument to https://www.baldurbjarnason.com/2025/trusting-your-own-judge..., but I like this one better because at least it doesn’t try to pull the rhetorical trick of slipping from “we can’t know whether LLMs are helping because we haven’t studied the question systematically” to “actually we do know, and they’re shit”.
- troupo 3 days ago |parent
  Wow. Quite a conclusion from an article that actually doesn't reach for that conclusion
  - dcre 3 days ago |parent
    It’s a bad and incoherent essay. It simultaneously wants to say “we can’t know”, “we do know and it doesn’t work”, and “we know it makes engineering faster and that’s bad because most engineering is bad.” This isn’t nuance, it’s confusion. Sorry!
    - troupo 3 days ago |parent
      It simultaneously wants none of those things
yahoozoo 3 days ago
The thing is, the questions such as “are they an expert in the domain” … “are they good at coding to being with” … and so on only really apply to the folks claiming positive results from LLMs. On the flip side, someone not getting much value - or dare I say, a skeptic - pushes back because they _can see_ what the LLM gave them is wrong. I’m not providing any revelatory comment here, but the simple truth is: people who are shit to begin with think this is all amazing/magic/the future.
geetee 3 days ago
I'm working with product managers that are almost certainly using LLMs to generate product requirement docs, complete with code samples, data type definitions, and diagrams. Everything looks good to the untrained eye, but it's complete and utter bullshit. LLM abuse is going to be the end of so many tech companies.
- bwfan123 3 days ago |parent
  > Everything looks good to the untrained eye
  thats the trick with bullshit in general.
  > LLM abuse is going to be the end of so many tech companies
  and is also going to provide a lot of opportunities for experienced engineers to cleanup the mess.
ibaikov 3 days ago
Crypto and NFT situation happened because of our society, media and vc/startup landscape who hype things up a lot for their own reasons. We treat massive technologies as new brands of bottled water. Or, actually, as a new hype toy as fidget spinners or pop it toys. This tech is massively more complex and you have to invest time to learn about its abilities, limitations and potential developments. Almost nobody actually does this, it's easier to follow hype train and put money into something that grows up and looks cool without obvious cons. Crypto is cool for some stuff. On the other hand, where's your Stepn (and move to earn in general), decentraland cities, Apes that will make a multimedia universe? Where's "you'll be paying using crypto for everything"?
Same for LLMs and AI: it is awesome for some things and absolutely sucks for other things. Curiously tho, it feels like UX was solved by making chats, but it actually still sucks enormously, as with crypto. It is mostly sufficient for doing basic stuff. It is difficult to predict where we'll land on the curve of difficult (or expensive) vs abilities. I'd bet AI will get way more capable, but even now you can't really deny its usefulness.
- CharlesW 3 days ago |parent
  It makes no sense to compare the current AI hype to the tulip mania of crypto/NFTs. A much better parallel is to cloud computing hype in 2009.
  - ares623 3 days ago |parent
    I was just joining the workforce at that time, so I was unaware of any hype. From my perspective, cloud computing was just "there". Was the hype really comparable to today?
  - ibaikov 3 days ago |parent
    I think it does. Startups, VCs and media still provoke hype trains. People cherry pick and extrapolate everything in AI as well.
    - selfhoster11 3 days ago |parent
      Except the benefits of LLM adoption can be measured empirically, with your own eyes. And don't require network effects to fully realise benefits, unlike cryptocurrency and NFTs. Ignoring the hype and doing your own thing with it is a good choice.
      - ibaikov 3 days ago |parent
        I agree to some level, but crypto and NFTs also have value without network effects present.
        You could even argue that without network effects AI is also very limited: way less users -> way worse models. It took OpenAI to commit capital first to pull this off.
        The point is I think comparing these areas (and other tech) is still interesting and worthy.
orbital-decay 3 days ago
The point about non-determinism is moot if you understand how it works. An accurate LLM always gives the same result where the same result is needed, no matter how many times you ask it. Try asking any LLM what is 2x2 on a temperature it's designed for, what are the chances to get 5 in a reply?
In reality, modern LLMs trained with RL have terrible variance and mainly learn 1:1 mapping of ideas to ideas, which is a big issue for creative writing and parallel inference/majority voting techniques, so there's even less meaningful "non-determinism" available than you might think. It's usually either able or not able to give the correct answer, rerolling it doesn't work well. I think even a human has more non-determinism than a modern LLM (it's impossible to measure though).
- troupo 3 days ago |parent
  Are those LLMs in the room with us now? ;)
  Actually, I did try asking ollama running locally. That should've reduced the amount of non-determinism and whatever layers providers add, and the uncertainty of computer availability.
  I asked it for a list of Javascript keywords ordered alphabetically. Within 5 minutes it produced a slightly different list
  Asking a model for 2x2 is moot because 2x2=5 is statically highly unlikely. Anything more complex though?
  - orbital-decay 3 days ago |parent
    >I asked it for a list of Javascript keywords ordered alphabetically. Within 5 minutes it produced a slightly different list
    That's not my point, the keyword here is "meaningful". How many of those lists are correct? (ignoring the fact prompting a LLM for lists is a bad idea, let alone local ones)
    If you spend some time with SotA LLMs, you'll see that on rerolling they express pretty much the same ideas in different ways, most of the time.
    - troupo 3 days ago |parent
      > That's not my point. How many of those lists are correct? (ignoring the fact prompting a LLM for lists is a bad idea, let alone local ones)
      > Yes. That's missing the forest for the trees, though.
      So, what is your point and forest?
      That there exists some non-deterministic accurate LLM with a 100% correct training set which will always produce the same answer if you ask it 2x2 (where asking is somehow different from prompting)?
      - Does it exist?
      - Where? (considering that the world at large uses chatgpt/claude/gemini)
      - what's its usefulness beyond 2x2?
      - orbital-decay 3 days ago |parent
        >So, what is your point and forest? That there exists some non-deterministic accurate LLM with a 100% correct training set which will always produce the same answer if you ask it 2x2 (where asking is somehow different from prompting)?
        As I already said: modern LLMs mainly map a single input idea to a single output idea. They might express it in slightly different ways, but it's either correct or incorrect, you can't easily turn an incorrect result into a correct one by rerolling. If you spend any time with Gemini/o3/Claude, you understand this from the first-hand experience. If you know what current RL algorithms do, you understand why this happens.
        An ideal LLM would learn one-to-many correspondence, generalizing better, and that still won't be any problem as long as the answer is correct. Because correctness and determinism are orthogonal to each other.
        troupo 3 days ago |parent
        I still fail to see your point.
        Here's what you started with: "The point about non-determinism is moot if you understand how it works."
        When challenged you're now quite literally saying "oh yeah, they are all non-deterministic, will produce varying results, it's impossible to control the outcome, and there's some ideal non-existent LLM that will not have these issues"
        So what's your point and forest again?
        orbital-decay 3 days ago |parent
        I feel like we're walking in circles and not sure if you're doing this on purpose...
        >Here's what you started with: "The point about non-determinism is moot if you understand how it works."
        Yes, the authors' point about non-determinism is moot because he draws this conclusion from LLMs being non-deterministic: "what works now may not work even 1 minute from now". This is largely untrue, because determinism and correctness are orthogonal to each other. It's silly to read that as "LLMs are deterministic".
        >When challenged you're now quite literally saying "oh yeah, they are all non-deterministic, will produce varying results, it's impossible to control the outcome, and there's some ideal non-existent LLM that will not have these issues"
        That's not what I'm saying. Better LLMs would be even less deterministic than current ones, but even that would not be a problem.
        >So what's your point and forest again?
        Point: "Determinism and correctness are orthogonal to each other".
        Forest: there's much more going on in LLMs than statistical closeness. At the same time, you can totally say that it's due to statistical closeness and not be wrong, of course.
        troupo 3 days ago |parent
        Ah, I see your point now (and I'm the author :) )
        I agree with you after this explanation.
        I think (especially in the current offerings) non-determinism and incorrectness are so tightly intertwined that it's hard to say where one starts and the other one ends. Which makes the problem worse/more intricate.
        And yeah, prompting is useless: https://dmitriid.com/prompting-llms-is-not-engineering
cedws 3 days ago
I still see little conversation about the two fundamental limitations of LLMs right now: context size, and prompt injection.
* Computation does not scale linearly with context size, meaning the ‘memory’ of LLMs is limited and gets more expensive as it gets bigger.
* Prompt injection limits the usability of LLMs in the real world. How can you put an LLM in the driving seat if malicious actors can talk it into doing something it’s not supposed to.
Whenever I see a blog post by Anthropic or OpenAI I do a Ctrl+F for “prompt injection.” Never mentioned. They want people to forget this is a problem — because it’s a massive one.
ls-a 3 days ago
AI helped me find a bug that was going undetected and also helped me fix it. I was debugging a completely different bug, and decided to get a bit creative with what context i should share. AI pointed out the other bug. What I'm trying to say is people should stop talking about AI making them faster, that is not a good goal, unless your manager sucks. People should talk about concrete cases of how AI helped them instead. Another example I can give is AI helped me understand a part of a complicated protocol without reading the spec. It explained just the part I needed at the moment. AI made me dread my work less. I hope founders start getting creative with AI tools instead of copying each other.
hamilyon2 3 days ago
I am impressed by speed of the sound goalpost movement.
Few days ago Google released very competent summary generator, interpreter between 10-s of languages, gpt-3 class general purpose assistant. Working locally on modest hardware. On 5 years old laptop, no discrete GPU.
It alone potentially saves so much toil, so much stupid work.
We also finally “solved computer vision”. Read from PDF, read diagrams and tables.
Local vision models are much less impressive and need some care to use. Give it 2 years.
I don't know if we can overhype it when it archives holy grail level on some important tasks.
- assuagering 3 days ago |parent
  I have had SOTA models stray from factual content in documents I provided them with within 2-3 prompts.
  They haven't solved anything. They are just fast and look good doing what we ask them to do. But they corrupt data with a passion and to that hype just responds: "just give us 10x as much money and compute".
taylorallred 3 days ago
This is why I’d love to see some more examples of people sitting down and using these tools in day-to-day work rather than just hearing “they’re great!” or “they suck!”
eachro 3 days ago
"And 50% of the time they work 50% of the time."
I think this is still an incredible outcome given how many dice rolls you can take in parallel with multiple claude/o3/gemini attempts at a problem with slightly different prompts. Granted, each rollout does not come for free given the babysitting you need to do but the cost is much lower than going down the path yourself/having junior colleagues make the attempt.
3 days ago
[deleted]
sherdil2022 3 days ago
I follow Emily Bender on LinkedIn. She cuts through the AI hype and is also the author of The AI Con book - https://thecon.ai/
Of course people will either love AI or hate AI - and some don’t care. I am cautious especially when people say ‘AI is here to stay’. It takes away agency.
- farts_mckensy 3 days ago |parent
  AI is here to stay, and you do not have agency over that. You can choose not to use it, but that has zero impact on the broader adoption rate. Just like when the automobile was introduced and society as a whole evolved.
  - goatlover 3 days ago |parent
    There was agency in policies promoting automobile adoption over rail travel. People who act like market forces are inevitable trends in nature tend to forget the political part where companies convince politicians to make policies in their favor, while bombarding the public with advertising. Like fossil fuel companies for example. Continued climate change wasn't inevitable. It was a choice humans with lots of money and power made and are still making.
    - farts_mckensy 3 days ago |parent
      And that's what's going to happen now. No amount of complaining about it online is going to stop it. They literally just passed a bill that prevents regulating AI. Rich people and their puppets in congress have some degree of agency. We don't. Sorry.
      - sciencejerk 3 days ago |parent
        I heard the AI clause didn't make it into the final version of the bill that was passed (?)
      - goatlover 2 days ago |parent
        99-1 vote in the Senate removed that from the bill before it went to the House.
DiscourseFan 3 days ago
ChatGPT can write research papers in about 20 minutes—its the “Deep Research” tool. These are not original papers, but it can perform complex tasks that require multiple steps that would normally take a person hours. No its not a magic superintelligence, but it will transform a lot of white collar labor.
arendtio 3 days ago
I think it is more like googling: when the search engine appeared, everybody had to learn how to write a good query, even though the expectation was that everybody could use them.
With LLMs, it's quite similar: you have to learn how to use them. Yes, they are non-deterministic, but if you know how to use them, you can increase your chances of getting a good result dramatically. Often, this not only means articulating a task, but also looking at the bigger picture and asking yourself what tasks you should assign in the first place.
For example, I can ask the LLM to write software directly, or I can ask it to write user stories or prototypes and then take a multi-step approach to develop the software. This can make a huge difference in reliability.
And to be clear, I don't mean that every bad result is caused by not correctly handling the LLM (some models are simply poor at specific tasks), but rather that it is a significant factor to consider when evaluating results.
- leptons 3 days ago |parent
  >Yes, they are non-deterministic
  The LLM is more like a Ouija board than a reliable tool.
  >I can ask it to write user stories or prototypes
  By the time I write enough to explain thoroughly to an LLM what to write in "user stories" or "prototypes", I could have just written it myself, without the middleman(bot), and without the LLM hallucinating.
  If half the time I spend with an LLM is telling it what to do, and then another half is correcting what it did, then I'm not really saving any time at all by using it.
  - arendtio 3 days ago |parent
    For me, it is a different experience. Yes, it takes time to provide the LLM with the necessary context, but overall it can save me 3-4 times the time I would have invested. Maybe I am just slower at software development myself ;-)
    However, the work you do is indeed more that of a product owner than that of a developer. To avoid hallucinations, providing extensive documentation and allowing the LLM to perform test-driven development can be a game-changer.
    When you do that, generation time is highly correlated (negatively) with code quality. So when the AI solves the tasks quickly and easily, you have a good chance of it generating good code. As soon as the AI has to try and try again to build something working, you should be very skeptical of the result.
    Over the past months and years, I have used this method, and because it is somewhat reproducible, you can see the progress that LLMs are making. Sessions where the model does stupid things are becoming fewer, and sessions where the model finds a good solution become more frequent.
  - selfhoster11 3 days ago |parent
    Writing the prompt takes time, but in many cases it's like rubber-duck debugging - it helps you to figure out the actual requirements, flow, or even incorrect/missing assumptions. Is it worth the time? Subjective.
3 days ago
[deleted]
awkwabear 3 days ago
This seems like a really bad take.
I do PhD research for superconducting materials and right I've been adapting and scaling an existing segmentation model from a research paper for image processing to run multithreaded and took the training runtime per image from 55min to 2min. Yeah it was low hanging fruit but honestly its the type of thing that is just tedious and easy to make mistakes and spend forever debugging.
Like sure I could have done it myself but it would have taken me days to figure out and I would have had to test and read a ton of docs. Claude got it working in like half an hour and generated every data plot I could need. If I wanted to test out different strategies and optimizations, I could iterate through various strategies rapidly.
I don't really like to rely on AI a bunch but it indisputably is incredibly good at certain things. If I am just trying to get something done and don't need to worry about vulnerabilities as it is just data collection code that runs once, it saves a tremendous amount of time. I don't think it will outright replace developers but there is some room for it to expand the effectiveness of individual devs so long as they are actually providing oversight and not just letting it do stuff unchecked.
I think the larger issue is more how economically viable it is for businesses to spend a ton on electricity and compute for me to be able to use it like this for 20 bucks a month. There will be an inevitable enshittification of services once a lot of the spaces investors are dumping money are figured out to be dead ends and people start calling for returns on their investment.
Right now the cash is flowing cause business people don't fully understand what its good at or not but that's not gonna last forever.
- phyzome 3 days ago |parent
  I don't think you got the author's point.
  They didn't say "AI is bad". Take another look.
  - awkwabear 3 days ago |parent
    Ah I'm sorry you're totally right, I was irresponsible and just skimmed the article last time.
    I retract my last statement about it being a bad take
    - troupo 3 days ago |parent
      It's fine. I'm as guilty of skimming articles and bad takes as well :)
notphilipmoran 3 days ago
I think that the disparity comes between people that are too in the weeds believing that their use cases apply to everyone. The reality is this world is made up of people with a wide array of different needs and AI is yet to proliferate into all usage applications.
Sure some of this comes from a lack of education.
But similar to crypto these movements only have value if the value is widely perceived. We have to work to continue to educate, continue to question, continue to understand different perspectives. All in favor of advancing the movement and coming out with better tech.
I am a supporter of both but I agree with the reference in the article to both becoming echo chambers at times. This is a setback we need to avoid.
- antonvs 3 days ago |parent
  > But similar to crypto these movements only have value if the value is widely perceived.
  The difference in the AI case is that companies that are actually able to use it to boost productivity significantly will start to outcompete those who don't.
  That's why, unlike crypto/blockchain, so many mainstream companies are pouring money into AI. It's not FOMO so much as fear of extinction.
- tempodox 3 days ago |parent
  Was this text generated by an LLM?
scrubs 3 days ago
Op? Take a standing ovation and a victory lap. Well written.
3cats-in-a-coat 3 days ago
These polarized opinions that say AI is everything or nothing only reveal the emotional state of the writer, rather than any deep insights.
Sherveen 3 days ago
This is completely incoherent. 3 reasons:
1. he talks about what he's shipped, and yet compares it to crypto – already, you're in a contradiction as to your relative comparison – you straight up shouldn't blog if you can't conceive that these two are opposing thoughts
2. this whole refrain from people of like, "SHOW ME your enterprise codebase that includes lots of LLM code" – HELLO, people who work at private companies CANNOT just reveal their codebase to you for internet points
3. anyone who has actually used these tools has now integrated them into their daily life on the order of millions of people and billions of dollars – unless you think all CEOs are in a grand conspiracy, lying about their teams adopting AI
- Ferrus91 2 days ago |parent
  Some even integrating in such a way as to professionally humiliate himself or herself: https://www.hcrlaw.com/news-and-insights/in-house-lawyers-be...
- infp_arborist 3 days ago |parent
  Re. 2 - I find it fascinating that people using coding agents have seemingly no problem to reveal their full codebases to AI companies.
  - Sherveen 3 days ago |parent
    Can you clearly articulate me why they should have a problem?
    If you don't want training on your codebase, many AI companies offer this option. What's the issue?
afiodorov 3 days ago
We've been visited by alien intelligence that is simultaneously fascinating and underwhelming.
The real issue isn't the technology itself, but our complete inability to predict its competence. Our intuition for what should be hard or easy simply shatters. It can display superhuman breadth of knowledge, yet fail with a confident absurdity that, in a person, we'd label as malicious or delusional.
The discourse is stuck because we're trying to map a familiar psychology onto a system that has none. We haven't just built a new tool; we've built a new kind of intellectual blindness for ourselves.
blueboo 3 days ago
Ok. Claude Code produces most code at Anthropic. Theres an enterprise code base, with acute real needs. There are real, experienced SWEs. How much babysitting and reviewing is undetermined; but the Ants seem to tremendously prefer the workflow.
Even crypto people didn’t dogfood their crypto like that, on their own critical path.
- troupo 3 days ago |parent
  > Ok. Claude Code produces most code at Anthropic.m
  Does it? Or does their marketing tell you that? Strange that "most code is written by Claude" and they still hire for actual humans for all the positions from backend to API to desktop to mobile clients.
  > How much babysitting and reviewing is undetermined; but the Ants seem to tremendously prefer the workflow.
  So. We know nothing about their codebase, actual flows, programming languages, depth and breadth of usage, how much babysitting is required...
- peter422 3 days ago |parent
  In my codebase in my proprietary project, it’s possible that LLMs write around half the code, but that’s just because a lot of the trivial parts of the project are quite verbose.
  The really difficult and valuable parts of the codebase are very very far beyond what the current LLMs are capable of, and believe me, I’ve tried!
  Writing the majority of the code is very different from creating the majority of the value.
  And I really use and value LLMs, but they are not replacing me at the moment.
- thisoneworks 3 days ago |parent
  Regardless of the usefulness of llms, if you don't work at anthropic, how gullible are you to believe that claim at face value?
- taurath 3 days ago |parent
  Truly, how could they have the valuation they have and do anything else?
- rightbyte 3 days ago |parent
  > but the Ants
  Is that the official cutsie name people working there are called? Those feels so 2020 ...
3 days ago
[deleted]
Narciss 3 days ago
Ok, I’ll take the bait and answer the questions you’ve set in the article.
Who am I? Senior engineer with 5 years of experience, working within the AI team.
Project: social platform with millions of MAU
Codebase: Typescript, NextJS frontend, express backend, Prisma ORM, monorepo . Hundreds of thousands of lines of code.
My expertise sits in the same domain, language, codebase that I apply AI to.
When do I use AI? All the time. It has significantly sped me up.
How do I use AI? Almost always in a targeted fashion. I line it up like a sniper rifle - I understand my problem, where in the codebase it should act in order to apply it, and what general structure I want. Then I create the prompt to get me there. Before, I needed to do the above anyway, minus the prompt, but then I needed to use my slow hands to code dozens or hundreds of lines of code to get there. Now, the AI does it for me, and it makes me much faster. To clarify, I do not plan everything in the prompt - I let the AI color between the lines, but I do give it the general direction. Of course, I also use AI for smaller tasks like writing SQL queries for example, and it’s saved tons of time there too.
In my side projects, I have a different approach - I vibe code my way through with less “sniper targeting”, because I care less about the beauty of the code organization and more about results. If I do see AI slop, I then ask the AI to clean it up, or redo the prompt with more targeting.
Overall, AI has significantly sped me up. I am still heavily involved as you can tell, but it is a phenomenal tool, and I can envision it taking over more and more of the code writing process, with the right rules in place (we are actively working on that at my work, with md files and rebuilding our FE from the ground up with AI in mind).
Does this sort of answer give you a better idea of how AI helps?
localghost3000 3 days ago
I've developed the following methodology with LLM's and "agentic" (what a dumb fucking word...) workflows:
I will use an LLM/agent if
- I need to get a bunch of coding done and I keep getting booked into meetings. I'll give it a task on my todo list and see how it did when I get done with said meeting(s). Maybe 40% of the time it will have done something I'll keep or just need to do a few tweaks to. YMMV though.
- I need to write up a bunch of dumb boilerplatey code. I've got my rules tuned so that it generally gets this kind of thing right.
- I need a stupid one off script or a little application to help me with a specific problem and I don't care about code quality or maintainability.
- Stack overflow replacement.
- I need to do something annoying but well understood. An XML serializer in Java for example.
- Unit tests. I'm questioning if this ones a good idea though outside of maybe doing some of the setup work though. I find I generally come to understand my code better through the exercise of writing up tests. Sometimes you're in a hurry though so...<shrug>
With any of the above, if it doesn't get me close to what I want within 2 or 3 tries, I just back off and do the work. I also avoid building things I don't fully understand. I'm not going to waste 3 hours to save 1 hour of coding.
I will not use an LLM if I need to do anything involving business logic and/or need to solve a novel problem. I also don't bother if I am working with novel tech. You'll get way more usable answers asking about Python then you will asking about Elm.
TL;DR - use your brain. Understand how this tech works, its limitations, AND its strengths.
3 days ago
[deleted]
weare138 3 days ago
Just wait until we get AI running on the blockchain in the metaverse. The future is here.
3 days ago
[deleted]
rickvidallon 3 days ago
In a brief nutshell, I'm personally not in favor of how AI is taking over the industry (all of industry, that is). First of all, LLMs aren't even really what I would call AI, but that's a digression. I think, mainly, people are using it for the wrong reasons. I've seen graphics designers at work use it to generate images—and I don't just mean Adobe Firefly generative fill, but all-out "create me this logo." That's just lazy, given their job titles, and has started making everything look samey. Worse than how Bootstrap, Tailwind, etc., has the tendency to make everything samey.
For someone like me, who is self-taught on essentially all of his career skills, I have particular concern for a world in which people use AI to "learn things," when that tech doesn't allow them to make mistakes. It just does things for people. For that reason alone, I don't see AI as a viable way to learn at all. If your parents never take their hand from the back of the bicycle seat, do you really know how to ride a bike without falling over? Isn't the scraped knee how we truly mastered that skill?
For SEO in particular, I'd probably defer to someone with more daily experience, like you. That said, I think I can extrapolate from what I've seen elsewhere that the sameyness may likely start to affect content itself (in fact, it already has, for so many formerly good news outlets). Google search kinda blows in recent years. The AI Overview feature means so many people aren't visiting the source website anyway.
To me, none of this looks appetizing. It looks like a snake eating its own tail.
I don't mean to sound so bummer about the topic, but I've begun to worry about my own place in this ecosystem for the next 15, 20 years until I retire. Most of the joy in development has been sucked out of the art. Today, it seems mostly about getting wrapped around the axle of countless frameworks (without even really understanding them) and manhandling those chocolates on Lucy's and Ethel's conveyor belt. It's a comedy of errors. I'll yell "Get off my lawn" with the best of them. Add the same issues to SEO, and I don't really know where we'll end up, but it doesn't look creative, to me. It looks like a sad cliché, like the rows upon rows of sad souvenir shops that all kind of sell the same thing—the tourist trap that travelers (at least, travelers like me) actually loathe and try to avoid.
- Ferrus91 2 days ago |parent
  Like so much of everything produced now. Mainstream film and video games have been non-AI slop for almost 2 decades now.
DeepYogurt 3 days ago
Nuance. You love to see it
tiahura 3 days ago
Everything? As a lawyer, I’m producing 2x - with fewer errors. Admittedly, law is a field that mostly involves shuffling words around so it may be the best case scenario, but much of the skepticism comes off as cope.
- KerrAvon 3 days ago |parent
  uh
  https://www.technologyreview.com/2025/05/20/1116823/how-ai-i...
  https://hai.stanford.edu/news/hallucinating-law-legal-mistak...
  https://www.reuters.com/technology/artificial-intelligence/a...
  There are more of these stories every week. Are you using AI in a way that doesn’t allow you to be entrapped by this sort of thing?
  - tiahura 3 days ago |parent
    ChatGPT links to the actual text in a case now. Also, take the output from Claude, and put it into Gemini and tell it to verify holdings. Furthermore, spot checking 10 cases doesn’t take long.
    - troupo 3 days ago |parent
      > ChatGPT links to the actual text in a case now.
      Or to text in a law is hallucinated: https://www.timesofisrael.com/judge-slams-police-for-using-a...
- logsr 3 days ago |parent
  computer programming and law are very similar. computer code is called code because it is the law that dictates the behavior of the computer. law is a bit different because it is a program that runs on people, who aren't as deterministic as machines, but in theory law and the interpretation of law are also supposed to be completely logical, and you can translate back and forth directly from the logic of law to a logical expression in a computer program.
  i specialize in programming, and LLMs are very good right now, if you set them up with the right tooling, feedback based learning methods, and efficient ways of capturing human input (review/approve/suggest/correct/etc).
  with programming you have compilers and other static analysis tools that you can use to verify output. for law you need similar static analysis tooling, to verify things like citations, procedural scheduling, electronic filing, etc, but if you loop that tooling in with an llm, the llm will be able to correct errors automatically, and you will get to an agent that can take a statement of fact, find a cause of action, and file a pro se lawsuit for someone.
  courts are going to be flooded with lawsuits, on a scale of 10-100X current case loads.
  criminal defendants will be able to use a smart phone app to represent themselves, with an AI handling all of the filings and motions, monitoring the trial in real time, giving advice to the defendant on when to make motions and what to say, maximizing delay and cost for the state with maximum efficiency.
  with 98% of convictions coming from guilty pleas (https://www.npr.org/2023/02/22/1158356619/plea-bargains-crim...) which are largely driven by not being able to afford the cost of legal services the number of criminal defendants electing to go to full jury trial could easily explode 10-20X or more very quickly.
  fun times!
khazhoux 3 days ago
I'm honestly started to get pissed seeing these articles on HN every day. Aren't we technologists? I see these articles and it's just people who can't accept the future, or feel threatened by it, or are pissed they are not contributing to it.
I'm an expert in my field with decades of experience, and a damn good programmer, and Cursor is like zipping down the street on a powered bike. There is no wishful thinking here!
Can I use it for everything? No, and that applies to every tool. But ffs I have to read here every day on HN some self-important blogger telling me I'm imagining things.
- troupo 3 days ago |parent
  > I'm honestly started to get pissed seeing these articles on HN every day. Aren't we technologists?
  Funnily enough, that's exactly what the article asks.
  > I'm an expert in my field with decades of experience, and a damn good programmer, and Cursor is like zipping down the street on a powered bike.
  Oh look, an unverified claim. Go through the first list in the article, and ask "what we, as technologists, know about this claim".
  You can also actually read the article and see that I actually use these tools every day.
  - khazhoux 3 days ago |parent
    Yeah, I read your article. It's the latest in the stream of "everyone is full of crap" about LLMs.
    You say you've used it to build multiple projects, but then title it "everything is wishful thinking". That makes no sense. You're either click-baiting or have an inconsistent stance.
    > This is crypto all over again
    No. The fundamental vision for BTC is that it grows to be a mainstream currency for a huge variety of use cases. If BTC continues to be used just for pirate sites, black market, etc, then it will be considered to have failed. With LLMs the success is immediate. We don't need LLMs to solve every software problem, for them to be massive accelerator of daily coding.
    - troupo 3 days ago |parent
      > That makes no sense. You're either click-baiting or have an inconsistent stance.
      Or you haven't actually understood a single thing from the article.
      Like the fact you say "aren't we technologists" immediately followed by a completely unverifiable claim that everyone is supposed to take at face value, uncritically.
      And any criticism is immediately labeled as "crap", "people who can't see the future" etc. Remind you of anyone?
      - khazhoux 3 days ago |parent
        You had a fine criticism in the article: that people's claims of success or failure should be accompanied by contextual information, to prevent bad extrapolations. But then you title it "Everything around LLMs is still magical and wishful thinking." Was this tongue in cheek? Hyperbole for effect? Because it seems you're saying that when the data comes out, people will realize this was all a fraud.
        There's a big difference between stating "everything" here is wishful thinking, versus calling for more data to understand the strengths and weaknesses.
        troupo 3 days ago |parent
        "People make huge outrageous claims without backing them up with anything except 'trust me bro', and people uncritically accept that as truth" would probably make a better title.
        But if you can't go past the title, well, you might have bigger problems.
        khazhoux 2 days ago |parent
        My only problem is when people say everything about LLMs is wishful thinking.
    - hackable_sand 3 days ago |parent
      I've used it to build multiple projects and yeah my consensus is that LLM-bros are in denial about this new wave of AI tech.
      We're not there yet. Try again in 20.
alittlebee 3 days ago
[dead]
jnfno 3 days ago
[dead]
careful_ai 3 days ago
[flagged]
- Jordan-117 3 days ago |parent
  - superficial emotion
  - cliché phrasing
  - em dashes
  - abundant alliteration
  - all comments suspiciously similar in length
  - all posts pointing to the same website
  Does HN not have a policy against vapid AI comment spam? If not, it needs one.
  edit: It does:
  https://news.ycombinator.com/item?id=37617714
  - bgwalter 3 days ago |parent
    This one is funny though. I'd bet on Bing CoPilot, which now always agrees with "AI" concerns because MSFT has probably realized that no one wants "AI" and takes a more cautious approach.
    - careful_ai 3 days ago |parent
      [dead]
  - farts_mckensy 3 days ago |parent
    Watch out, HN. The em dash police are here. Hands up
    - careful_ai 3 days ago |parent
      [dead]
  - careful_ai 3 days ago |parent
    [dead]
    - gjm11 3 days ago |parent
      For what it's worth, I had a similar "that looks like AI writing" response, and it wasn't because it was "too polished". And having looked at the rest of your comment history, the only reason why I'm only at 90% confidence it's all AI-generated rather than 100% is your explicit claims to the contrary. Today's LLMs have a definite style that is, sorry, not the same thing as being "polished", and if your comments have "zero automation involved" then it's quite the extraordinary coincidence how much more like an AI you sound than any other human writer I have ever encountered. And a further coincidence that this very AI-sounding human just happens to be selling services to "unlock Meaningful Business Outcomes With AI".
    - Jordan-117 3 days ago |parent
      Reaction to a post at 21:39:36:
      https://news.ycombinator.com/item?id=44468067
      Reaction to a different post at 21:40:30:
      https://news.ycombinator.com/item?id=44468069
      Fast typist! (Incidentally, both are exactly 59 tokens long)
      - careful_ai 2 days ago |parent
        [dead]
- Arainach 3 days ago |parent
  Amen.
  At my job right now there is an imminent threat from a team empowered to say "what if we asked an AI to just build X instead of having a team build and maintain it?"
  X is something where it's straightforward when N is below 50 but deeply complex when N is in the thousands, which for our team it is, and there is a huge risk that this team will get a demo with N=15 that attracts leadership attention and trying to explain why the AI-generated solution does not scale is a career-limiting move framing me as a naysayer, but this AI team would deliver the demo and go away and the inevitable failure of their solution at scale would ALSO be my team's problem, so..... I hate the future.
  - rdgthree 3 days ago |parent
    FWIW the comment you are responding to was authored by AI.
    - yyammka 3 days ago |parent
      two em-dashes!
    - careful_ai 3 days ago |parent
      [dead]
      - jules-jules 3 days ago |parent
        This is so chatGpt it hurts. Can we petition hn to ban ai generated comments? I see more Reddit communities actively putting a ban on ai, hn should follow if it can be done with the available resources.
        ThrowawayR2 3 days ago |parent
        Generated comments of any kind are already banned according to dang (https://news.ycombinator.com/item?id=33945628) even though it hasn't been added to the HN guidelines page. So you could legitimately contact the moderators to ask them to investigate.
        careful_ai 3 days ago |parent
        [dead]
  - careful_ai 3 days ago |parent
    [dead]
larve 3 days ago
Software methodologies and workflows are not engineering either, yet we spend a fair amount of time iterating and refining those. You can very much become better at prompt engineering. There is a huge differential between individuals, for example.
The code coming out of LLMs is just as deterministic as code coming out of humans, and despite humans being feckle beings, we still talk of software engineering.
As for LLMs, they are and will forever be "unknowable". The human mind just can't comprehend what a billion parameters trained on trillions of tokens under different regimes for months corresponds to. While science has to do microscopic steps towards understanding the brain, we still have methods to teach, learn, be creative, be rigorous, communicate that do work despite it being this "magical" organ.
With LLMs, you can be pretty rigorous. Benchmarks, evals, and just the vibes of day to day usage if you are a programmer, are not "wishful thinking", they are reasonably effective methods and the best we have.
jm20 3 days ago
The best way I’ve heard this described: AI (LLMs) is probably 90% of the way to human levels of reasoning. We can probably get to about 95% optimizing current technology.
Whether or not we can get to 100% using LLMs is an open research problem and far from guaranteed. If we can’t, it’s unclear if it will ever really proliferate the way things hope. That 5% makes a big difference in most non-niche use cases…
- ath3nd 3 days ago |parent
  > AI (LLMs) is probably 90% of the way to human levels of reasoning
  Considering LLMs have 0 level of reasoning, I can't decide if it's a bad take, or a stab at the average human's level of reasoning.
  In all seriousness, the actual numbers vary from 13% to 26%: https://fortune.com/2025/02/12/openai-deepresearch-humanity-...
  My take is that there are fundamental limitations to try to pigeon-hole reasoning to LLMs, which are essentially a very very advanced autocomplete, and that's why those % won't jump too much too soon.
  - farts_mckensy 3 days ago |parent
    Whenever people claim that LLMs are not capable of reasoning, I put them into a category of people who are themselves not capable of reasoning.
    - ath3nd 3 days ago |parent
      Whenever people claim that LLMs are capable of reasoning, I put them into a category of people who are themselves able to reason as much as an LLM.
      - farts_mckensy 3 days ago |parent
        You chuckled silently to yourself as you posted this.
        ath3nd 3 days ago |parent
        I did, I have to reluctantly admit.
- andy99 3 days ago |parent
  I've always looked at it as we're not making software that can think, we're (quite literally) demonstrating that vast categories of things don't need thought (for some quality level). The problem is, it's clearly not 100%, maybe it's 90-some percent, but it doesn't matter, we're only outsourcing the unimportant things that aren't definitional for a task.
  This is very typical of naive automation, people assume that most of the work is X and by automating that we replace people, but the thing that's automated is almost never the real bottleneck. Pretty sure I saw an article here yesterday about how writing code is not the bottleneck in software development, and it holds everywhere.
  - Ferrus91 2 days ago |parent
    The reason management thinks coding is the bottleneck is because they don't understand the first thing abiut code and neither have the ability or temprament to. Their whole professional career is about plausibly convincing other people through jargon, manipulation and popularity contests, which generally oprn up doors, solve problems and provoke seal like clapping from all involved. The idea that the core problem in many systems and software is due to their constitutonal inability to think rigorously to define requirements logically has never crossed their mind: it must be the magic spells those losers we bullied at school use and we are now tragically dependent on.
  - farts_mckensy 3 days ago |parent
    The discussion is completely useless without defining what thought is and then demostrating that LLMs are not capable of it. And I doubt any definition you come up with will be workable.
- krapp 3 days ago |parent
  >The best way I’ve heard this described: AI (LLMs) is probably 90% of the way to human levels of reasoning. We can probably get to about 95% optimizing current technology.
  We don't know enough about how LLMs work or about how human reasoning works for this to be at all meaningful. These numbers quantify nothing but wishes and hype.
- ethan_smith 3 days ago |parent
  These percentage estimates of AI's proximity to "human reasoning" are misleading abstractions that mask fundamental qualitative differences in how LLMs and humans process information.