Google DeepMind has unveiled new research highlighting an AI agent that’s able to carry out a swath of tasks in 3D games it hasn’t seen before. The team has long been experimenting with AI models that can win in the likes of Go and chess, and even learn games without being told their rules. Now, for the first time, according to DeepMind, an AI agent has shown it’s able to understand a wide range of gaming worlds and carry out tasks within them based on natural-language instructions.
The researchers teamed up with studios and publishers such as Hello Games (No Man’s Sky), Tuxedo Labs (Teardown) and Coffee Stain (Valheim and Goat Simulator 3) to train the Scalable Instructable Multiworld Agent (SIMA) on nine games. The team also used four research environments, including one built in Unity in which agents are instructed to form sculptures using building blocks. This gave SIMA, described as “a generalist AI agent for 3D virtual settings,” a range of environments and settings to learn from, with a variety of graphics styles and perspectives (first- and third-person).
“Each game in SIMA’s portfolio opens up a new interactive world, including a range of skills to learn, from simple navigation and menu use, to mining resources, flying a spaceship or crafting a helmet,” the researchers wrote in a blog post. Learning to follow directions for such tasks in video game worlds could lead to more useful AI agents in any environment, they noted.
Google DeepMind
The researchers recorded humans playing the games and noted the keyboard and mouse inputs used to carry out actions. They used this information to train SIMA, which has “precise image-language mapping and a video model that predicts what will happen next on-screen.” The AI is able to comprehend a range of environments and carry out tasks to accomplish a certain goal.
The researchers say SIMA doesn’t need a game’s source code or API access — it works on commercial versions of a game. It also needs just two inputs: what’s shown on screen and directions from the user. Since it uses the same keyboard and mouse input method as a human, DeepMind claims SIMA can operate in nearly any virtual environment.
The agent is evaluated on hundreds of basic skills that can be carried out within 10 seconds or so across several categories, including navigation (“turn right”), object interaction (“pick up mushrooms”) and menu-based tasks, such as opening a map or crafting an item. Eventually, DeepMind hopes to be able to order agents to carry out more complex and multi-stage tasks based on natural-language prompts, such as “find resources and build a camp.”
In terms of performance, SIMA fared well based on a number of training criteria. The researchers trained the agent in one game (let’s say Goat Simulator 3, for the sake of clarity) and got it to play that same title, using that as a baseline for performance. A SIMA agent that was trained on all nine games performed far better than an agent that trained on just Goat Simulator 3.
Google DeepMind
What’s especially interesting is that a version of SIMA that was trained in the eight other games then played the other one performed nearly as well on average as an agent that trained just on the latter. “This ability to function in brand new environments highlights SIMA’s ability to generalize beyond its training,” DeepMind said. “This is a promising initial result, however more research is required for SIMA to perform at human levels in both seen and unseen games.”
For SIMA to be truly successful, though, language input is required. In tests where an agent wasn’t provided with language training or instructions, it (for instance) carried out the common action of gathering resources instead of walking where it was told to. In such cases, SIMA “behaves in an appropriate but aimless manner,” the researchers said. So, it’s not just us mere mortals. Artificial intelligence models sometimes need a little nudge to get a job done properly too.
DeepMind notes that this is early-stage research and that the results “show the potential to develop a new wave of generalist, language-driven AI agents.” The team expects the AI to become more versatile and generalizable as it’s exposed to more training environments. The researchers hope future versions of the agent will improve on SIMA’s understanding and its ability to carry out more complex tasks. “Ultimately, our research is building towards more general AI systems and agents that can understand and safely carry out a wide range of tasks in a way that is helpful to people online and in the real world,” DeepMind said.