Learn the AI ​​to play Mario, watch live

On TikTok, between “get ready with me” videos, life hacks and memes, some robots are working on a challenge that most of us have faced at some point in our lives: beating up. Super Mario World. Over the past week, users have been live streaming the AI’s attempts to learn to play Mario, and for one robot in particular, it’s going pretty well. His name is Rupert and he only beat level 2.

The AI’s strategy will be familiar to anyone who remembers their first time wielding a Super Nintendo Controller. Rupert runs, jumps, hits enemies, falls off cliffs and dies—again, and again, and on. Every time he dies, Rupert tries again. Usually, it does almost the same moves as the one killed in the last round. But if you watch long enough, you’ll notice that Rupert is evolving and getting better. This is learning.

“It’s a program built to simulate natural selection with a neural network,” said Join the PCMasterRace, a TikTok user responsible for Rupert, who asked not to use his real name. (PCMasterRace is the offensive name of a subreddit about desktop computers.)

In other words, Rupert is a system of machine learning algorithms that get better by seeing their own mistakes. Rupert has a definite objective: get to the other end of the level. It knows what buttons it can press and can see what is happening on the screen. (You can actually see what Rupert “sees” at the top left of the video below.) But unlike the human Mario operator, the AI ​​can’t just assume that it should avoid the Koopas or fall off the cliff. Rupert has all the positive and negative feedback. Basically, Rupert tries to do random things. It remembers what did and didn’t work and improves its strategy over time.

Rupert is modeled after evolution in the sense that it works using “species” and “generations”. The AI ​​uses a specific strategy for each species, which ranges from about two to six runs. For every 50-100 species, the AI ​​combines what it learns into “generations”.

As the AI ​​plays, it gets a “fitness” score. Mario’s fitness increases based on how far to the right he goes and how fast he gets there. Generations with high fitness are selected to “breed” for future generations, meaning AI builds on top of behaviors and patterns that work and starts new ones. This allows his decision making to become more sophisticated and complex over time.

It’s slow going, but it works. It took Rupert just 57 generations to overcome the first level, prompting celebrations in the comments as viewers cheered Rupert’s success.

Rupert, along with another TikTok-streaming AI Mario player named George, is running an open-source program called Mari/O. It was created by coder and live-streamer Seth Hendrickson, who goes by Sethbling online. Mari/O is not new. Hendrickson made it public years ago, but the robot machinery has been renewed in an era where the tech industry wants to believe that AI will soon take over the world.

Mari/O is simpler than a system like ChatGPT, but it’s a window into how AI models work. These AI tools throw spaghetti at the wall and humans design the system to tell if this attempt was better or worse than the last attempt. Efforts get better as time goes on. Now imagine that happening millions or billions of times. You can see a more detailed explanation in one of Hendrickson’s videos:

With ChatGPT, it’s exponentially more complicated. Mari/O doesn’t have that many options: left, right, up, down, A, B, X, and Y. On the other hand, the English language has hundreds of thousands of words, countless ways to arrange those words, and theoretically infinite ideas. Mari/O is much simpler than ChatGPT—and the technology is fundamentally different—but if you understand how MarI/O works, you can extract it for a useful understanding of chatbot technology.

Rupert, unfortunately, is just a little guy. It’s doing best, but Rupert is going to suffer as he gets further into the game. Mari/O’s system only rewards itself based on how far Mario gets to the right of the screen, but in some levels in Super Mario World, you have to go up to reach the goal instead of right.

“However, I plan to modify it so that it can climb vertical structures better,” join PCMasterRace.

Leave a Comment