Custom UT Bots Pass “Turing Test”, Win BotPrize

By Jim Rossignol on September 28th, 2012 at 6:00 pm.


Phys are reporting that a University Of Texas team won a $7000 in a competition to create game bots that would pass as human. “The winning bots both achieved a humanness rating of 52 percent. Human players received an average humanness rating of only 40 percent. The two winning teams will split the $7,000 first prize,” says the Phys report. “When this ‘Turing test for game bots’ competition was started, the goal was 50 percent humanness,” the bot’s creator, Risto Miikkulainen, is quoted as saying. “It took us five years to get there, but that level was finally reached last week, and it’s not a fluke.” The bot mimicked humans by pursuing grudges, having poor aim at long range, and by using neural networks to “evolve” the bot’s behaviour towards something that would be optimal in the game’s environment.

Does anyone know of any games that use bots for language responses? I can’t think of any offhand, but it must be going on, and there must be an intriguing state of the art for the “real” Turing Test in games.

, , , .

83 Comments »

  1. Premium User Badge

    Gundato says:

    Uhm… if humans are only averaging 40 percent, I think the metric may be bad.

    • JackShandy says:

      The metric comes from the human players themselves. Perhaps we could improve it by replacing them with some kind of bot-detecting bot.

    • InternetBatman says:

      Replace “hax” with “bots” and that might make sense. Maybe humans find it easier to believe they were beaten by computers than more skilled players.

    • Hug_dealer says:

      The people playing should also not be familiar with fps games like that. Perhaps only long enough to get used to the game.

      I find it pretty easy to spot people using wallhacks and other things like that in games anymore. You begin to notice patterns, atleast I do. I would say someone like most of us would see those patterns much quicker than others.

      • Capt. Eduardo del Mango says:

        If the test is “make the AI pass for human to gamers” then the only metric you can use is “do gamers think it’s human?”

        • Hug_dealer says:

          that isnt the goal, which is why gamers should not be used.

          • Premium User Badge

            emertonom says:

            Are you sure that’s not the goal? That’s how the Turing Test normally works.

          • Capt. Eduardo del Mango says:

            “Phys are reporting that a University Of Texas team won a $7000 in a competition to create game bots that would pass as human.

            Yes, that is the goal – make gamers think something is human. So yes, that’s the goal, and then the only metric you can use is “do gamers think it’s human?”

    • Baines says:

      Humans aren’t always good choices at detecting humanity.

      If you tell a person that some of his opponents are human and some are AI, then that person is going to start looking for what he believes are AI “behavior”. The person will likely look for repeated patterns, “inhuman” skill, and stupid mistakes.

      The problems here are that:

      1) Humans are extremely variable. Some humans will also repeatedly run the same patterns and fall for the same tricks. Some humans have “inhuman” skill. And humans will make stupid mistakes. It is possible for humans to fail a Turing Test.

      2) Humans see what they want to see. Humans see patterns where patterns don’t exist. Humans read behavior and intentions that are wrong. Heck, studies have shown that what people see is literally affected by their beliefs. I don’t mean just the old “see patterns where patterns don’t exist,” but rather actual individual moments and scenes. Things like an elderly white woman, who upon seeing a black man leaning over a white man swears, and can pass a polygraph test after the fact, that the black man had a knife, when the guy was not only unarmed but was actively trying to help the white guy.

      Just think to your own multiplayer experiences, whether FPS, DOTA, fighting games, or whatever else. How many cries of “hackz”? How many excuses of “My button didn’t work”? Some of these cries are going to be true, but not all of them are. How many times have you been accused of cheating, when you weren’t? Cheating is kind of like intentionally looking for bot behavior, people know that others cheat in online games, so they tend to believe what they see as “abnormal” behavior is signs of cheating. Even when it isn’t.

      • Premium User Badge

        Hidden_7 says:

        Exactly this. In fact, humans “failing” a Turing test is absolutely vital to how a Turing test works. The traditional Turing test involves someone conversing with one AI and one human, being told this is the case, and asked to pick which one was the AI. A human being mistaken for AI is built into how the test works.

        Given that this test seemed to be in a similar vein, of course the humans would have to score lower than the bots for the bots to win. People are actively told they are playing with bots, if the bots passed for humans the humans must have passed for bots.

        You could do a test where people were told that they might be interacting with an AI, and to then decide if they were, and which one it was, but that’s a different test, and one I reckon would be a lot easier for the AI to pass.

        • beetle says:

          I forsee a grim cyberpunk future where humans can no longer compete with artificial intelligences in their ability to appear human. Real ‘human’ connection will only be possible between humans and AIs. AIs will eventually evolve beyond the requirement that they form relationships with humans and eventually form relationships exclusively with other AIs. Humanity sits in the corner and cries, abandoned both by itself and its creations.

          • Baines says:

            It makes me think of moments where reality is stranger than fiction, or where fiction is “realer” than reality.

          • Swanny says:

            Ye gads, man, this is an amazing quote.

      • kyrieee says:

        “Humans aren’t always good choices at detecting humanity.”
        So what? If you think a bot is human then it has succeeded, whether you can find some other metric to show that it’s human in its behaviour or not. You’re not trying to make it human, you’re only trying to make it appear human to real humans.

  2. Hug_dealer says:

    more human than human?

    thats pretty cool to read.

  3. CaLe says:

    Does it take into account the skill level of the players? How familiar with the game they are? I think I’d have a much harder time spotting a bot in something like UT, whereas I’d have an easier time doing it in Counter-Strike — simply because I’ve played it a lot more.

    • dsi1 says:

      IDK if RPS linked to the page hosting the videos, but the players they had were even dumber than default bots.

  4. Unaco says:

    There is something of a theory these days, that the “Turing Test” was just Turing’s method to get annoying people to shut-the-f*ck-up at dinner parties, and stop badgering him. Regardless of whether that is true or not, it’s… problematic to say the least. It’s an interesting philosophical topic, but I don’t think it has much weight as far as true Artificial Intelligence goes these days.

    I have a slightly different version myself… If the AI can produce a topical, and funny, pun during the conversation.

    I’m seeing one slight problem with this, and its relation to the “Turing Test” (aside from the fact that this has nothing to do with ‘intelligence’). I always thought the goal of beating a Turing Test was to produce an artificial construct that is indistinguishable from a human. If the average human score is 40, and this bot is getting 52, then I’m going to guess that you can distinguish between the two… the bot will be ‘more human’, by whatever the humanness scale is. Either their bot is too good, or the humanness scale is… off.

    On the topic of the Turing Test, I do know that the people of the OpenWorm project (seeking to make a virtual C. Elegans) are looking to test their construct with a behavioural Turing Test against real C. Elegans.

    http://www.openworm.org/

    • LionsPhil says:

      *click* *whirr*
      Sounds like an elegans approach to evaluation to me.
      *beepboop*

    • MrLebanon says:

      there is an AI that makes puns! It also makes video games (and usually makes the title a pun)

      Hell if I can remember the name of it… it was featured on RPS a while ago

    • Premium User Badge

      c-Row says:

      I have a slightly different version myself… If the AI can produce a topical, and funny, pun during the conversation.

      Can you?

      • Unaco says:

        Can I what?

        • Premium User Badge

          c-Row says:

          Produce a fun and topical pun. Or are you secretly a bot yourself?

          • Unaco says:

            Hey… It’s my Turing Test. I don’t have to convince myself I’m human.

            Just for you though, to stop you c-rowing on about the truth of my biology… I guess we can say they really captured the flag in this competition. (Can I stop there? It’s late and it’s Friday, and I ain’t got much to work with here).

          • Premium User Badge

            Gap Gen says:

            Topical puns can lead to skin allergies, so be careful with that.

          • Premium User Badge

            c-Row says:

            I think they will do for now. I will keep an eye on you, though… human.

          • The Random One says:

            I find your pun insufficient, therefore you only have 35% humanity tops. Better luck next time.

      • Donjo says:

        I’ll tell you about my mother. KABLAMO

    • Premium User Badge

      Bluerps says:

      Indeed, most of current AI research isn’t really concerned with things like the Turing test. It’s more “How can we get the machine to do this complicated task, that until now only humans could do?” than “How can we make a machine that acts like a human?”. More engineering than philosophy.

      That being said, building a punning machine sounds like an interesting project!

      • Bart Stewart says:

        >> Indeed, most of current AI research isn’t really concerned with things like the Turing test.

        True. Most AI research is concerned with “how can I get funding.” That tends to skew what gets attempted toward whatever will gain the favor of the administrative gatekeepers.

        On this UT result, shouldn’t we be calling this a “limited Turing test” at best? The breadth of the human experience is vastly larger than human-shaped objects trying to kill each other.

        It doesn’t hurt to have experiments like this one — you have to make progress somewhere — but it’s nothing like Turing’s actual description of intellectually and emotionally unbounded interactions. The research into chatbots seems more promising as a path toward success in a general Turing test.

        • Premium User Badge

          Bluerps says:

          ‘Most AI research is concerned with “how can I get funding.”’
          Oh yes. Believe me, I know. Actually, that statement is still true when you remove ‘AI’ from it…

          And yes, as a Turing test, playing UT is pretty limited. It is much easier to get a computer to play a game like a human than to get it to communicate like one.

          • Mike says:

            It’s also harder to make a souffle that can discuss Shakespeare you before you eat it. It doesn’t make less difficult things uninteresting or simple just by comparison though.

          • Premium User Badge

            Bluerps says:

            Sure. I didn’t say that, did I?
            It’s not that they didn’t do anything impressive, it’s just not really a successful Turing test.

          • Koozer says:

            When I was studying for my Geology degree, there was a running joke among researchers that to be guaranteed funding, all you had to do was work the words ‘global warming’ into your proposal title somehow.

        • Mike says:

          You’re basing this “Most AI research is just to get funding” on what, exactly? Sounds like a bit of a sweeping generalisation to me.

          • Ragnar says:

            No, you misunderstand. He’s saying that most AI research is concerned with getting funding. Getting the funding isn’t a goal of the research, but an obstacle to overcome.

    • Nate says:

      “I have a slightly different version myself… If the AI can produce a topical, and funny, pun during the conversation.”

      Something like that would be necessary to pass the Turing Test. The computer would need to come up with an appropriate response to “Make a funny pun” and “No, for real this time.” Or, a response at least as appropriate as a human subject would give.

      The Turing Test is only a valid test in the presence of skepticism by the tester, who has to know that there’s the potential for a computer to be replying, and has to be motivated to want to tell the difference. Of course, we ignore that part of it when we’re getting excited about this or the other chat bot :)

      “If the average human score is 40, and this bot is getting 52, then I’m going to guess that you can distinguish between the two.”

      If you read the attached article, it’s not like the test was done very rigorously, probably in the name of fun. For instance, a bot could have behaved very very bot-like and still won, provided it was good enough at dodging the bot-marking gun. In an actual Turing test, you should see false positives+accurate identification adding up to 100% (what was identified as a bot in the other 8% of times?). Still, it’s not like you’re likely to ever see a perfect 50-50 split, and anything relatively close to 50-50 should be seen as passing.

      More, the test needs to be reiterated. It’s perfectly possible for a computer to pass a Turing test at one point in time, and fail later, as people learn new bot-like behaviors to identify. If a computer is being seen as more human most of the time, then it’s perfectly reasonable to expect that people would adjust to that, and, given time, maybe reverse their finding, but probably only when you’re seeing false positives much more frequently than a 50 out of 92 (?) times.

  5. Premium User Badge

    Lord Custard Smingleigh says:

    I, for one, welcome our Unreal Tournament-playing overlords.

    • tungstenHead says:

      They are…

      GODLIKE!

      • Danny252 says:

        It’s good that you welcome them, for I fear they may be…

        UNSTOPPABLE.

        • Vagrant says:

          Let’s just hope they don’t go on a…

          RAMPAGE!

          • gamma says:

            …and finish us off with a DOUBLE, MULTI (or worse yet) HEADSHOT MEGAKILL.

            (still demoing Rankin to this very day)

          • hamburger_cheesedoodle says:

            If the AI is organic, it may get WICKED SICK and the problem will solve itself, War of the Worlds style.

          • Ragnar says:

            But if they’re not, they could go on a KILLING SPREE.

  6. Hoaxfish says:

    The bot have also evolved a variety of human-like responses, seen telling other players “lrn2play noob” (accompanied with a detailed description of how much it hates them), calling the Call of Duty series “a linear un-game sort of nonsense”, and explaining why DRM is bad.

    • x1501 says:

      I’m not that sure that “lrn2play noob” qualifies as a human-like response.

      • DarkFenix says:

        Indeed, if it doesn’t at least also call them a fag it’s not going to convince anyone.

  7. RogB says:

    its nice to see some people are still doing this, though sadly its far less common.
    I remember the days of quake1 where people were competing to make the best deathmatch AI bots and it was interesting to see their progression and the way they developed. (like using breadcrumbs dropped by the player so the bot could use the same routes the players did)
    randomised ‘chat’ taunts were fun too.

    As far as I remember the Reaperbot (one of the better ones) was by Steven Polge who went on to do the Unreal Tourney AI. Dont know if he’s still doing it though!

    • LionsPhil says:

      The UT bot AI is outstanding. I have torment the 2004 version with maps of narrow beams and awkward paths, and it demonstrates and impressive degree of ad-libbing when it falls off the straight and narrow of map hinting and has to fend for itself.

  8. Acerbjorn says:

    This doesn’t seem that amazing. I mean the spambots on this site passed the Turing test ages ago by my reckoning. Joking aside this is rather cool.

  9. Skabooga says:

    They move like we do.

  10. deadly.by.design says:

    They pass the Turing Test, but do they pass the Teabag Test?

    • RogB says:

      if (human_killed) = true
      {
      moveto_human
      crouch
      crouch
      crouch
      chat(“how do you like THEM apples?”)
      }

      • Askeladd says:

        That’s probably like most people’s minds that feel the need to ‘teabag’ work.

  11. riadsala says:

    “Does anyone know of any games that use bots for language responses?”

    Starship Titanic!

  12. GameCat says:

    So we need Voight-Kampff test now, right?

    • Premium User Badge

      Lord Custard Smingleigh says:

      Reaction time is a factor in this, so please answ… What the… FLAK CANNON! EVERYBODY DOWN!

    • clive dunn says:

      Let me tell you about my mother….

    • Premium User Badge

      Andy_Panthro says:

      You are in a desert, walking along, and suddenly you look down and see a tortoise, and you see it crawling towards you…

      You reach down and flip that tortoise on it’s back…

  13. pupsikaso says:

    ” The bot mimicked humans by pursuing grudges, having poor aim at long range,”

    By that definition I’d fail the test too… =/??

  14. lijenstina says:

    The problem with the metric is that as soon there is a situation for which the AI is not programmed it’s quite easy to spot the error. Let’s say that in a testing map there is a deliberate bug – like the player goes through the floor and falls out the map and dies. Human players will start avoiding it while the AI lead characters will go through the map all the time because their AI is not having a scheme to recognize it.

    Also the more complex the gameplay is easier is to spot an AI character. For instance, if the player can jump over some rocks to make a shortcut to arrive to a location much faster, the AI follower character which doesn’t have an AI path there, will go around on his AI map. Or dropping an item that blocks the way the npc will get stuck on it etc.

    Anyway the metric is subjective. The control sample (human players identified as bots) says that there is something wrong with the test.

    • EPICTHEFAIL says:

      Yeah, I haven`t seen any mention of pathfinding AI (then again, I`m the kind of guy who skims cliff notes), however the metric can`t possibly not be subjective, since it is testing whether or not a member of a demographic who accuses people of being bots anyway can differentiate between an actual bot and a fellow meatbag.

    • hamburger_cheesedoodle says:

      Part of what might make UT2004 easier to program bots for is the fact that their routes are preprogrammed by the level designer (Not the selection of which route to take, but which routes they can take- and in objective based games like CTF, good level designers can give additional weight to certain paths) and so that’s one less thing for the team who made these bots to work on. The upside is that this means the mapper can add in behaviours like rocket/hammer jumping, translocator movement, and specific trick jumps/dodges. The downside is that it means if players find new/better routes that the level designer didn’t anticipate, the bots will never take those routes.

      Your specific example of a bot falling through the floor is fairly unlikely to happen, as bots will ignore path nodes that tell them to go places they can not, even in stock UT2004. I suppose you could construct the map in such a way that the bots would intentionally get themselves killed over and over, but you would honestly have to put a bit of work into it. The AI programming for UT is pretty robust.

      • LionsPhil says:

        If memory serves, stock UT2004 bots that fall off the mapper’s paths will crouch-walk-explore, attempting to jump over any low obstacles they encounter, until they can find their way back onto the path. If they have an objective, like a flag to go for, they’ll bias toward wandering in that direction.

        There are some amusing debug modes, from straight up intent-etc. path-and-LoS-glowy-lines visualiations, to a mode that will step through all the jumps in the map and brute-force test them in all the kinds of way you can jump in UT2004 (sheildgun, translocator, double, dodge…)

        Clever little blighters. Covers a multitude of mapper sins; much better than the (much older, but still well-regarded!) HL1 ones that just freeze up if you take their nodes away.

  15. hamburger_cheesedoodle says:

    Now I can’t help but want to play against these bots myself to see just how good they are.

  16. Kakkoii says:

    DotA 2. The bots in that game have been going through constant revisions every week for over a year now. They act incredibly human, and will call out typical voice responses to tell teammates about what’s happening or what they are about to do, such as “ganking bot”, “pushing mid”, “mid missing”.

    The multitude of spells and items with abilities they have to deal with create much more complex computational problems than an FPS game. And the element of fog in the game creates even more issue. The dev team on that game have done an amazing job, I’d love to see their bots put through this test.

  17. Grey Ganado says:

    Fun fact: I’ve never passed the Turing Test.

    • Premium User Badge

      Lord Custard Smingleigh says:

      Are you kidding? It’s as easy as $NOUN_4.

  18. namad says:

    multiplayer computer ai that is capable of competing in a human way isn’t very turing, since the turing test is mostly about having the awareness to be able to do interact with humans on a human level, not on a game level (simpler games have had very “human” ai’s for longer making this sort of test… well not the same thing as turing tests), like say make small talk, properly taunting opponents claiming everything is op every 3 seconds :-p

  19. Nick says:

    of course, some people are easier to fool than others

    http://penny-arcade.com/comic/2002/10/04