Tanstack Start | Browser Agent Benchmark: Comparing LLM models for web automation

wiradikusuma 2 days ago
Since we're in this topic, can anyone suggest good AI-based tool for exploratory (fuzzy?) web testing?
pixel_popping 2 days ago
It's lacking the best model (Opus 4.5) on the benchmark tho.
- djohnston a day ago |parent
  Yeah but then their own product might not score the highest.
  - pixel_popping 9 hours ago |parent
    Exactly why I'm pointing it out, which feels a bit corrupt, but understandable.
    - djohnston 7 hours ago |parent
      tbh i was a bit cranky yesterday - even if they are #2 on a legit benchmark that would be impressive
MagMueller 2 days ago
[dead]