Getting Better At Hearthstone With Computers

Dr. Elie Bursztein is a computer security expert employed at Google who, for the past few months of his free time, has been focused on Blizzard’s card game Hearthstone. He and his wife Celine used math and machine learning to understand more about the game, detailing their results in blog posts. They started out wanting to identify cards that were “undervalued”–where you got the most power for your mana–using an algorithm. Separately, they began analysing games and decklists to build a model to predict what cards would be played by opponents based on what had already happened. It ended up pretty dang accurate all right.

He gave a talk about all this at the DEF CON 22 hacking conference last month, and now it’s online for us to watch.

There’s a lot of stuff I find interesting in here. Valuing cards is a key skill in collectible card games and, depending on the amount of luck in a specific game’s systems, can be the most important one. While draws will always play a factor, ensuring that you have the most powerful cards possible in your deck is one of the easiest ways to win, next to tight play. Breaking it down into a formula can help or hinder this process. It can show solid but non-flashy cards like Chillwind Yeti are still powerful without any additional effects. It’s less useful as cards become more complex however, as the debate over Elie’s analysis of the variable-power Van Cleef card shows.

The tool he uses to predict opponent’s cards was originally going to be released, but in this post he explains that after Blizzard contacted him he decided not to. They were apparently very supportive of his research into balancing, but felt the tool would provide an unfair advantage. They also argued that it made the game less fun due to removing decision making from the player.

Elie’s stopping work on the Hearthstone project for a few weeks, but says he’ll be back to talk more game balance and prediction in the future. His algorithms for deciding which cards are under or overvalued are still imperfect but I do want to see more done with them. There’s ongoing discussion in the vast comment sections of both blog posts on the topic. He advises people to keep an eye on his twitter feed for any updates.


  1. mtomto says:

    It might be fun to make, but it certainly kills the fun of the game.

  2. Reapy says:

    I thought it was really interesting to see the way you would actually go about weighing and analyzing something like hearthstone, how you’d try to figure out the value of each card against one another. That is a pretty cool skill that could be applied to a lot of things, and it helps because playing the game you can kind of get a feel for what the value should be enough to sanity check your work.

    I don’t know that the tool here would really identify too much more than something a player would have a feel for by just playing the game. Even in the example, the top predicted card wasn’t the card played, it was one of 3, and when you play some of those decks enough, you pretty much already know what is going to come out, there aren’t many surprises.

  3. AIAndy says:

    The linear regression approach to analysing the card value misses a lot of the interactions that even the basic card stats have.
    For linear regression to work, the card stats have to be independent in their value and linear in their power. Neither is true.
    The health of 0 attack creatures is considerably less valuable than the health of 2 attack creatures.
    Creatures that have less attack than the common health value of cards of their mana level cannot trade well, so they are far less valuable, unless they have considerably more health but that additional health has less value per point.
    On the other hand a single point more of health that lifts a creature above the normal damage threats of their mana level is very valuable.
    1 health creatures without divine shield are bad in general because of the hero powers of about half the heroes.

  4. RanDomino says:

    They’re using some really iffy parameters in their card valuation. Mainly, the fact that a card that is bigger or deals more direct damage is going to be more expensive and therefore less mana-efficient (i.e. fireball vs pyroblast) which is why their algorithm kicked out a bunch of 0-2 cost cards as being “undervalued”. But what they’re missing is that the number of cards available is also scarce. Boulderfist Ogre and Pyroblast have 50% more meat than Chillwind Yeti or Fireball for the same number of cards. Wisp is not undervalued; it’s garbage because it has practically no effect on the game but costs an entire card which could have been something else. If they account for the fact that each card costs 1 card, their results would be much more relevant.

    • Moraven says:

      Wisp is infinite value.

    • Premium User Badge

      SuddenSight says:

      Their algorithm is not as bad as it might look at first. They do account for the value of a card (though confusingly, he refers to it as the “cost” of a card and puts the variable on the right side of the equation, even though the “cost” of a card is negative).

      There are a number of more subtle problems:

      1) The model only compares cards to other cards with equivalent effects. Thus, in general, he will tend to find cards that are undervalued in absolute terms. The algorithm will not find the hidden synergies that lead to broken combos, such as the now-nerfed hunter release the hounds decks. Even moderate combos, such as the greater value Inner Fire puts on high-health minions is difficult to judge with this method. Thus, it is not surprising that the algorithm turns up low hanging fruit like the argent squire as undervalued – it is one of the best 1/1s in the game, and doesn’t rely on synergies to reach its potential.

      2) Some aspects of the algorithm depend on starting assumptions. Consider the case of charge – he initially tried charge as a static value, but later decided to multiply by the attack stat because the value of charge depends strongly on attack. These changes in implementation can drastically affect which cards are undervalued or overvalued.

      3) His approach to dealing with cards that have unique affects is interesting, but flawed. By only considering cases where cards are actually played he is biasing the results towards greater value, because players are more likely to play a card in a situation where it has high value. The Van Cleef example is perfect for this – no one is going to play a naked Van Cleef, but many players will prioritize Van Cleef if the situation seems good for it – they might even waste other spell cards to pump VC. This contributes to making VC look more valuable on average than it actually is.

      4) The subset of cards considered is pretty good, but he is only considering cards that work for his algorithm so he overlooks some important cards. The most important oversight (in my opinion) is hero innates. These are actually very important to the game balance – they *dont* cost a card, so they don’t possess the intrinsic “card value,” but they *do* give pretty sizable bonuses considering that they can be cast infinitely. I think including hero innates in the data set would drive the innate card value up.

      5) He compares across all classes equally. I understand why he did this, but I would rather see the data set for each class processed separately. This would better show how the value of each card for each class is different, because different classes have an easier time getting various abilities. For example, mages have less use for stone tusk boars than a warlock would, because they have an innate burn spell.

      On the other hand, the nice thing about this model is it’s simplicity. A perfect model would be very complex, and require too much time to write and understand. This model at least gives us numbers to think about and shows the value of the cards according to one well-defined set of metrics (I do wish he’d post all the equations he used, however).

      Sorry for the long post. Neat math, even if the results are of questionable value.

      • Koozer says:

        On the other hand, the nice thing about this model is it’s simplicity. A perfect model would be very complex, and require too much time to write and understand.

        This is my biggest problem with this. it’s so simple to be quite useless (he ranks Light’s Justice as the most undervalued card for example), but any more complex and he’ll be heading into P-NP territory. Unless of course it is solvable, and he comes up with the perfect algorithm to pick the best cards in every card game ever made.

        The use of machine learning to predict moves seems much more fruitful.

      • Ben Barrett says:

        Rad post, thanks for this. I echo a lot of your thoughts.