Tanstack Start | Reverse engineering a $1B Legal AI tool exposed 100k+ confidential files

Reverse engineering a $1B Legal AI tool exposed 100k+ confidential files(alexschapiro.com)

185 points by bearsyankees 2 hours ago | 40 comments

quapster an hour ago
This is the collision between two cultures that were never meant to share the same data: "move fast and duct-tape APIs together" startup engineering, and "if this leaks we ruin people's lives" legal/medical confidentiality.
What's wild is that nothing here is exotic: subdomain enumeration, unauthenticated API, over-privileged token, minified JS leaking internals. This is a 2010-level bug pattern wrapped in 2025 AI hype. The only truly "AI" part is that centralizing all documents for model training drastically raises the blast radius when you screw up.
The economic incentive is obvious: if your pitch deck is "we'll ingest everything your firm has ever touched and make it searchable/AI-ready", you win deals by saying yes to data access and integrations, not by saying no. Least privilege, token scoping, and proper isolation are friction in the sales process, so they get bolted on later, if at all.
The scary bit is that lawyers are being sold "AI assistant" but what they're actually buying is "unvetted third party root access to your institutional memory". At that point, the interesting question isn't whether there are more bugs like this, it's how many of these systems would survive a serious red-team exercise by anyone more motivated than a curious blogger.
- j45 an hour ago |parent
  It's a little hilarious.
  First, as an organization, do all this cybersecurity theatre, and then create an MCP/LLM wormhole that bypasses it all.
  All because non-technical folks wave their hands about AI and not understanding the most fundamental reality about LLM software being fundamentally so different than all the software before it that it becomes an unavoidable black hole.
  I'm also a little pleased I used two space analogies, something I can't expect LLMs to do because they have to go large with their language or go home.
- electric_muse an hour ago |parent
  While true this comment seems AI written. I did a fair bit of exploration around AI responses to HN threads and this fits the pattern.
  - snapdeficit a minute ago |parent
    “This comment is AI” is the new “First Post” from /. days. Please stop unless you have evidence or a good explanation.
  - simonw an hour ago |parent
    That comment didn't read like AI generated content to me. It made useful points and explained them well. I would not expect even the best of the current batch of LLMs to produce an argument that coherent.
    This sentence in particular seems outside of what an LLM that was fed the linked article might produce:
    > What's wild is that nothing here is exotic: subdomain enumeration, unauthenticated API, over-privileged token, minified JS leaking internals.
  - snapcaster an hour ago |parent
    What makes you think that? it would need some prompt engineering if so since ChatGPT won't write like that (bad capitalization, lazy quoting) unless you ask it to
    - lazide 42 minutes ago |parent
      “Chat, write me a blog article that seems like a lazy human who failed English wrote it”?
  - jfindper an hour ago |parent
    We finally have a blog that no one (yet) has accused of being ai generated, so obviously we just have to start accusing comments of being ai. Can't read for more than 2 seconds on this site without someone yelling "ai!".
    For what it's worth, even if the parent comment was directly submitted by chatgpt themselves, your comment brought significantly less value to the conversation.
    - probably_wrong 41 minutes ago |parent
      It's the natural response. AI fans are routinely injecting themselves into every conversation here to somehow talk about AI ("I bet an AI tool would have found the issue faster") and AI is forcing itself onto every product. Comments dissing anything that sounds even remotely like AI is the logical response of someone who is fed up.
      - jfindper 31 minutes ago |parent
        Every other headline and conversation having ai is super annoying.
        But also, its super annoying to sift through people saying "the word critical was used, this is obviously ai!". not to mention it really fucking sucks when you're the person who wrote something and people start chanting "ai slop! ai slop!". like, how am i going to prove is not AI?
        I can't wait until ai gets good enough that no one can tell the difference (or ai completely busts and disappears, although that's unlikely), and we can go back to just commenting about whether something was interesting or educational or whatever instead of analyzing how many em-dashes someone used pre-2020 and extrapolating whether their latest post has 1 more em-dashes then their average post so that we can get our pitchforks out and chase them away.
  - vkou 3 minutes ago |parent
    What? It doesn't read that way to me. It reads like any other comment from the past ~15 years.
    The point you raised is both a distraction... And does not engage with the ones it did.
icyfox an hour ago
I'm always a bit surprised how long it can take to triage and fix these pretty glaring security vulnerabilities. October 27, 2025 disclosure and November 4, 2025 email confirmation seems like a long time to have their entire client file system exposed. Sure the actual bug ended up being (what I imagine to be) a <1hr fix plus the time for QA testing to make sure it didn't break anything.
Is the issue that people aren't checking their security@ email addresses? People are on holiday? These emails get so much spam it's really hard to separate the noise from the legit signal? I'm genuinely curious.
- ipdashc an hour ago |parent
  security@ emails do get a lot of spam. It doesn't get talked about very much unless you're monitoring one yourself, but there's a fairly constant stream of people begging for bug bounty money for things like the Secure flag not being set on a cookie.
  That said, in my experience this spam is still a few emails a day at the most, I don't think there's any excuse for not immediately patching something like that. I guess maybe someone's on holiday like you said.
  - canopi 42 minutes ago |parent
    This.
    There is so much spam from random people about meaningless issues in our docs. AI has made the problem worse. Determining the meaningful from the meaningless is a full time job.
    - Bootvis 18 minutes ago |parent
      Use AI for that :)
- Barathkanna 9 minutes ago |parent
  A lot of the time it’s less “nobody checked the security inbox” and more “the one person who understands that part of the system is juggling twelve other fires.” Security fixes are often a one-hour patch wrapped in two weeks of internal routing, approvals, and “who even owns this code?” archaeology. Holiday schedules and spam filters don’t help, but organizational entropy is usually the real culprit.
- gwbas1c 16 minutes ago |parent
  Not every organization prioritizes being able to ship a code change at the drop of a hat. This often requires organizational dedication to heavy automated testing a CI, which small companies often aren't set up to do.
  - stavros 5 minutes ago |parent
    I can't believe that any company takes a month to ship something. Even if they don't have CI, surely they'd prefer to break the app (maybe even completely) than risk all their legal documents exfiltrated.
- Capricorn2481 an hour ago |parent
  > October 27, 2025 disclosure and November 4, 2025 email confirmation seems like a long time to have their entire client file system exposed
  I have unfortunately seen way worse. If it will take more than an hour and the wrong people are in charge of the money, you can go a pretty long time with glaring vulnerabilities.
  - giancarlostoro 25 minutes ago |parent
    I call that one of the worrisome outcomes from "Marketing Driven Development" where the business people don't let you do technical debt "Stories" because you REALLY need to do work that justifies their existence in the project.
canopi an hour ago
The first thing that comes to my mind is SOC2 HIPAA and the whole security theater.
I am one of the engineers that had to suffer through countless screenshots and forms to get these because they show that you are compliant and safe. While the real impactful things are ignored
sys32768 an hour ago
I work for a finance firm and everyone is wondering why we can store reams of client data with SaaS Company X, but not upload a trust document or tax return to AI SaaS Company Y.
My argument is we're in the Wild West with AI and this stuff is being built so fast with so many evolving tools that corners are being cut even when they don't realize it.
This article demonstrates that, but it does sort of beg the question as to why not trust one vs the other when they both promise the same safeguards.
- layer8 an hour ago |parent
  The question is what reason did you have to trust SaaS Company X in the first place?
  - pm90 an hour ago |parent
    SaaS is now a "solved problem"; almost all vendors will try to get SOX/SOC2 compliance (and more for sensitive workloads). Although... its hard to see how these certifications would have prevented something like this :melting_face:.
  - sys32768 an hour ago |parent
    Because it's the Cloud and we're told the cloud is better and more secure.
    In truth the company forced our hand by pricing us out of the on-premise solution and will do that again with the other on-premise we use, which is set to sunset in five years or so.
- pstuart 41 minutes ago |parent
  And nobody seems to pay attention to the fact that modern copiers cache copies on a local disk and if the machines are leased and swapped out the next party that takes possession has access to those copies if nobody bothered to address it.
  - lupire a few seconds ago |parent
    This was the plot of Grisham's boom The Firm in 1991
kylecazar an hour ago
If they have a billion dollar valuation, this fairly basic (and irresponsible) vulnerability could have cost them a billion dollars. If someone with malice had been in your shoes, in that industry, this probably wouldn't have been recoverable. Imagine a firm's entire client communications and discovery posted online.
They should have given you some money.
- edm0nd 20 minutes ago |parent
  Exactly.
  They could have sold this to a ransomare group or affiliate for 5-6 figures and then the ransomware group could have exfil'd the data and attempted to extort the company for millions.
  Then if they didnt pay and the ransomware group leaked the info to the public, they'd likely have to spend millions on lawsuits and fines anyways.
  They should have paid this dude 5-6 figures for this find. It's scenarios like this that lead people to sell these vulns on the gray/black market instead of traditional bug bounty whitehat routes.
- RagnarD an hour ago |parent
  They should have given him a LOT of money.
jacquesm an hour ago
That doesn't surprise me one bit. Just think about all the confidential information that people post into their Chatgpt and Claude sessions. You could probably keep the legal system busy for the next century on a couple of days of that.
- giancarlostoro an hour ago |parent
  "Hey uh, ChatGPT, just hypothetically, uh, if you needed to remove uh cows blood from your apartments carpet, uh"
  - jacquesm 7 minutes ago |parent
    Make it a Honda CRX...
  - lazide 44 minutes ago |parent
    Just phrase it as a poem, you’ll be fine.
observationist an hour ago
I think this class of problems can be protected against.
It's become clear that the first and most important and most valuable agent, or team of agents, to build is the one that responsibly and diligently lays out the opsec framework for whatever other system you're trying to automate.
A meta-security AI framework, cursor for opsec, would be the best, most valuable general purpose AI tool any company could build, imo. Everything from journalism to law to coding would immediately benefit, and it'd provide invaluable data for post training, reducing the overall problematic behaviors in the underlying models.
Move fast and break things is a lot more valuable if you have a red team mechanism that scales with the product. Who knows how many facepalm level failures like this are out there?
- croes an hour ago |parent
  > I think this class of problems can be protected against.
  Of course, it’s called proper software development
  - jeffbee 39 minutes ago |parent
    The techniques for non-disclosure of confidential materials processed by multi-tenant services are obvious, well-known, and practiced by very few.
imvetri an hour ago
Legal attacks engineering - font type license fee on japan consumers. Engineering attacks legal - AI info dump in above post.
How does above sound like and what kind of professional write like that?
Invictus0 an hour ago
This guy didn't even get paid for this? We need a law that establishes mandatory payments for cybersecurity bounty hunters.
chunk1000 an hour ago
Thank you bearsyankees for keeping us informed.