HNNewShowAskJobs
Built with Tanstack Start
SIMA 2: An agent that plays, reasons, and learns with you in virtual 3D worlds(deepmind.google)
140 points by meetpateltech 7 hours ago | 50 comments
  • vessenesan hour ago

    OK, AI playing video games is cool. But you know what's really really cool? It looks like SIMA 2 is controlling the mouse and reading the screen at something approaching 30+fps. WANT. Computer use agents are so slow right now, this is really something. I wonder what the architecture is for this.

    • Workaccount2an hour ago |parent

      I desperately want an AI agent that can use my phone for me. Just something that takes instructs for each screen and execute it.

      "Open Chrome"

      "Go to xyz.com"

      "open hamburger menu"

      "Click login"

      etc. etc.

      • tantalor41 minutes ago |parent

        Isn't that what the voice a11y tools have been doing for years. Why do you need AI for that.

        https://support.google.com/accessibility/android/answer/6151...

        https://support.apple.com/en-us/111778

      • onion2k23 minutes ago |parent

        Droidrun did a Show HN recently. It's exactly that.

    • almostherean hour ago |parent

      Its even cooler if humans find something to be excited about in this world, since AI is replacing everything we do.

  • golol6 hours ago

    The gap between high level and low level control of robots is closing. Right now thousands of hours of task specific training data is being collected and trained on to create models that can control robots to execute specific tasks in specific contexts. This essentially turns the operation of a robot into a kind of video game, where inputs are only needed a in low-dimensional abstract form, such as "empty the dishwasher" or "repeat what I do" or "put your finger in the loop and pull the string". This will be combined with high-level control agents like SIMA 2 to create useful real-world robots.

    • catgaryan hour ago |parent

      I work on a much easier problem (physics-based character animation) after spending a few years in motion planning, and I haven’t really seen anything to suggest that the problem is going to be solved any time soon by collecting more data.

      • wordpad25 minutes ago |parent

        Why? Physics of large discrete objects (such as a robot) isn't very complicated.

        I thought it's fast accurate OCR that's holding everything back.

        • markisus4 minutes ago |parent

          The problem becomes complicated once the large discrete objects are not actuated. Even worse if the large discrete objects are not consistently observable because of occlusions or other sensor limitations. And almost impossible if the large discrete objects are actuated by other agents with potentially adversarial goals.

          Self driving cars, an application in which physics is simple and arguably two dimensional, have taken more than a decade to get to a deployable solution.

  • gs173 hours ago

    I hope we can get some (ideally local) version of this we can use as a "gaming minion". There's a lot of games where I probably would have played more if I could delegate the grind. If they're not that competent, it adds to the fun a little even.

    • a2128an hour ago |parent

      I've always wanted an AI that can play my video games for me, so that I can spend my time doing more fun and fulfilling things, like cleaning the toilet, folding my laundry, washing my dishes, taking out the garbage. Now I will no longer have to worry about the annoying chores in life, like drawing art, writing poetry, or playing video games

    • ragequittah15 minutes ago |parent

      This is what the wow bots were. They had a crazy level of agency even without AI.

    • efficax2 hours ago |parent

      sorry this is kind of nuts to me. You want something to play video games for you because the video game isn't fun? Just play a game that is fun. The point of the game is to play it

      • qoez2 hours ago |parent

        It could be fun in a factorio sense. Maybe the whole game becomes to delegate a bunch of smart robots and handle organization etc.

      • CuriouslyC2 hours ago |parent

        I mean, that's literally a RTS?

    • hoherdan hour ago |parent

      One thing I do with games is automate the grind. To me, that is part of the fun. I have built lego robots to press a sequence of buttons repeatedly, or programmed microcontrollers using circuitpython to press a series of keys or click the mouse at given intervals to grind various in-game currency and such. It's so common for me to do these kinds of things that I now instinctively look for places in gameplay that I can automate. I haven't done anything as complicated as using computer vision to look at the screen and respond to it, but I did see that Anthony Sottile did this to catch shiny pokemon https://youtu.be/-0GIY5Ixgkk and doing something like this has been out there on my horizon.

    • JLCarveth3 hours ago |parent

      I would love Minecraft with more intelligent villagers I could boss around to mine and build for me.

      • goda9016 minutes ago |parent

        You should look into modding. There have got to be a ton of automation and NPC scripting mods out there without any sort of AI model necessary.

      • yeaskuan hour ago |parent

        So factorio?

        • JLCarvethan hour ago |parent

          No.

    • 2OEH8eoCRo02 hours ago |parent

      Agree. It would be cool to populate my Valheim server with a bunch of agents that are in competition.

  • Workaccount27 hours ago

    >We’ve observed that, throughout the course of training, SIMA 2 agents can perform increasingly complex and new tasks, bootstrapped by trial-and-error and Gemini-based feedback.

    >In subsequent training, SIMA 2’s own experience data can then be used to train the next, even more capable version of the agent. We were even able to leverage SIMA 2’s capacity for self-improvement in newly created Genie environments – a major milestone toward training general agents across diverse, generated worlds.

    Pretty neat, I wonder how that works with Gemini, I suppose SIMA is a model (agent?) that runs on top of it?

    • FuckButtons5 hours ago |parent

      That’s what it sounded like to me, a plain text interface between two distinct systems.

      • kridsdale138 minutes ago |parent

        That’s what Claude Plays Pokémon is.

  • tschellenbachan hour ago

    It's like the factorio moment where you unlock the roboport. No more manual changes to the world, drone swarms to build housing, roads, bridges, parks etc. so exciting.

  • oersted6 hours ago

    I get why they do it, they are a business. I just wish Google would get off their ivory tower and build in the open more like they used to (did they? maybe I'm misremembering...).

    They've acquired this bad habit of keeping all their scientific experiments closed by default and just publishing press releases. I wish it was open-source by default and closed just when there's a good reason.

    Don't get me wrong, I suppose this is more of a compliment. I really like what they are doing and I wish we could all participate in these advances.

    • chankstein382 hours ago |parent

      Same! I want to play with this so bad!

    • singularity20014 hours ago |parent

      Dreamer v3 was open, v4 coming soon?

  • eminence324 hours ago

    This is obviously just a research project, but I do wonder about the next steps:

    * After exploring an learning about a virtual world, can anything at all be transferred to an agent operating in the real world? Or would an agent operating in the real world have to be trained exclusively or partially in the real world?

    * These virtual worlds are obviously limited in a lot of important ways (for example, character locomotion in a game is absolutely nothing like how a multi-limbed robot moves). Does there eventually need to be more sophisticated virtual worlds that more closely mirror our real world?

    * Google seems clearly interested in generalized agents and AGI, but I'm actually somewhat interested in AI agents in video games too. Many video games have companion NPCs that you can sort of give tasks to, but in almost all cases, the companion NPCs are nearly uncontrollable and very limited in what they can actually do.

    • nharada4 hours ago |parent

      The end goal is to marry the lessons learned about HOW to learn in a virtual world with a high fidelity world model that's currently out of reach for this generation of AI. In a year or two once we have a world model that's realistic enough and fast enough, robots will be trained there and then (hopefully) generalize easily to the real world. This is groundwork trying to understand how to do that without having the models required to do it for real.

    • mkoubaa2 hours ago |parent

      Look into the sim2real problem in robotics

  • ukuina6 hours ago

    At 0:52 in their demo video, there is a grammatical inconsistency in the agent's text output. The annotations in the video are therefore suspected to be created by humans after the fact. Is Google up to their old marketing/hyping tricks again?

    > SIMA 2 Reasoning:

    > The user wants me to go to the ‘tomato house’. Based on the description ‘ripe tomato’, I identify the red house down the street.

    • vessenesan hour ago |parent

      The scene just before you describe has the user write "ripe tomato" in the description - you can see it in the video. The summary elides it, but the "ripe tomato" instruction is also clearly part of the context.

    • m_w_6 hours ago |parent

      I can't speak to the content of the actual game being played, but it wouldn't surprise me if there was an in-game text prompt:

      > "The house that looks like a ripe tomato!"

      that was transformed into a "user prompt" in a more instructional format

      > "Go to the tomato house"

      And both were used in the agent output. At least the Y-axes on the graphs look more reasonable than some other recent benchmarks.

  • theLiminator3 hours ago

    Would be cool to see if they could make it play starcraft too and pit it against alphastar.

    • falcor843 hours ago |parent

      From what I see, SIMA only focuses on games where you control a single avatar from a 1st/3rd person perspective, and would assume that switching to a non-embodied game where you need to control the whole army at once would require significant retraining.

    • gs173 hours ago |parent

      I'm almost 100% confident AlphaStar would win that match, but I'd love to watch it.

  • JohnMakin5 hours ago

    Isn't most of this demo no man's sky? The voiceover doesn't make it clear that the world is not generated by SIMA.

    • xnx4 hours ago |parent

      It's hard to keep up with the many different models and pace of progress.

      Genie 3 is Google's world generating model: https://deepmind.google/blog/genie-3-a-new-frontier-for-worl...

    • tintor2 hours ago |parent

      This is not a world generating model.

      It is a game playing model.

      • JohnMakinan hour ago |parent

        And my post is saying that if you don't really know better, from the narration, you'd think google also generated the world. At least that was my impression, and I'm vaguely familiar with these things.

    • lawlessone4 hours ago |parent

      If it can get through those lengthy glitchy nms story mission tutorials quickly it's already a super intelligence.

      as much as some AI annoys me. This would be great for making games more accessible.

  • woeirua4 hours ago

    Yet another blogpost that looks super impressive, until you get to the bottom and see the charts assessing held out task performance on ASKA and MineDojo and see that it's still a paltry 15% success rate. (Holy misleading chart batman!) Yes, it's a major improvement over SIMA 1, but we are still a long way from this being useful for most people.

    • Workaccount24 hours ago |parent

      To be fair, it's 65% on all tasks (with a 75% human baseline) and 15% on unseen environments. They don't provide a human baseline for that, but I'd imagine it's much more than 15%.

      • woeirua3 hours ago |parent

        It really feels like we are determined to simulate every possible task in every possible environment instead of building true intelligence.

        • falcor842 hours ago |parent

          I personally am extremely impressed about it reaching 15% on unseen environments. Note that just this year, we were surprised that LLMs became capable of making any progress whatsoever in GBA Pokemon games (that have significantly simpler worlds and control schemes).

          As for "true intelligence" - I honestly don't think that there is such a thing. We humans have brains that are wired based on our ancestors evolving for billions of years "in every possible environment", and then with that in place, each individual human still needs quite a few years of statistical learning (and guided learning) to be able to function independently.

          Obviously I'm not claiming that SIMA 2 is as intelligent as a human, or even that it's on the way there, but based on recent progress, I would be very surprised if we don't see humanoid robots using a approaches inspired by this navigate our streets in a decade or so.

        • Gooblebrai2 hours ago |parent

          I'm curious what's your definition of "true intelligence"

  • tinfoilhatter5 hours ago

    [flagged]

    • dang4 hours ago |parent

      Could you please stop posting flamebait and breaking the site guidelines? You've unfortunately been doing it repeatedly, including this dreadful thread from a couple weeks ago: https://news.ycombinator.com/item?id=45781981. I realize the other person was doing it also, but you (<-- I don't mean you personally, but all of us) need to follow the rules regardless of what other people are doing.

      Comments like what your account has been posting are not what this site is for, and destroy what it is for, so if you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.

    • jandrese4 hours ago |parent

      It seems pretty clear to me that they're trying to develop AGI humanoid assistants/workers without the messy and expensive real world hardware. Basically approaching the problem from the other end than a company like Tesla that built a robot and are now trying to figure out how to make a computer drive it without needing constant hand holding.