Decision Tree Modeling of First Reactions in Heads-Up No Limit Hold'em Poker

Li Yang

my email


Course: EECS 349 - Machine Learning

Term: Fall 2010

Professor: Doug Downey



Background

In Heads-up Texas Hold'em Poker, the tournament is played between two players, Player 1 and Player 2. When a round of play starts, each player receives $20,000 in chips. The tournament consists of four hands of play. All chips bet by the players during a hand are put into the pot, to be won by whoever wins the hand. A hand is initiated by each player receiving two cards, hidden from the other player. A marker, known as the button, shows which of the players is to act first. After each hand, the button moves to the other player. Each hand starts with the players making two forced bets, known as the small blind and the big blind respectively, with the player with the button posting the small blind. Posting the blinds initiates the first of four betting rounds in following order: preflop, flop, turn, river. In the preflop round, no cards are on the board. In the flop round, three cards are placed on the board before betting starts. One additional card is added in the turn and river rounds before betting begins. When betting, each player to act has three options:

As suggested by the professor, a decision tree is a simple way to classify the response (check/call, raise, or fold) to an opponent's first action (check or bet) with the objective of winning as much money as possible from the current hand. Attributes from the current round of play that are considered include: opponent's first action (check or raise), amount of money opponent bets (for raise only), the probability of winning knowing your own hand and cards on the board (if any), and the probability of winning knowing the opponent's hand. Attributes from the previous round of plays that are considered include: percentage of calls that are raises, probability of winning knowing your own hand, probability of winning knowing the opponent's hand, and final amount of money each player has bet. The purpose of all this effort is to find interesting patterns in how one player determines the appropriate response to an opponent's action. For instance, we're interested in observing the influence of prior round attributes on current round actions. One example: does the percentage of raises made by an opponent in the preflop round factor in the response of the flop round? To answer these questions, we'll use the data from a website (http://www.computerpokercompetition.org) that hosts annual poker competitions between automated bots. Specifically we'll observe the actions of the 1st place winner, the Hyperborean bot by the University of Alberta, and the 2nd place contestant, the SartreNL bot. If any interesting differences between their play is observed, that will be reported as well.


Implementation

The log data from the computer bot competition contains all information from every round of play including the hand of the opponent. An example is as follows:

The respective players are Hyperborean (little blind) and Tartanian4 (big blind). Before the flop, Hyperborean raises to 300 and Tartanian4 calls. Following the flop, they both check. Following the turn card, both players check again. Following the river card, Hyperborean raises to 600 and Tartanian4 calls. We then see that Hyperborean had 4dTd in his hand i.e. 4 (diamonds) and 10 (diamonds), while Tartanian4 had a hand of 8cQc i.e. 8 (clubs) and Queen (clubs). The flop, turn, and river cards are: 3 (clubs) & Ace (hearts) & King (hearts), 10 (hearts), 3 (diamonds). Thus Hyperborean wins with two pair (10s over 3s) and gains the pot of 600. There are literally millions of rounds of play by each bot. We'll only observe 50,000 rounds of play by the top two bots, Hyperborean and SartreNL. A python program parses each line of data and extracts calculates relevant information needed for each round of play as follows: The most difficult aspect is calculating the probability of winning given known hands and cards on the table. To do so, a Monte Carlo simulation of possible future board cards must be done. Luckily an open source program from the site http://pokersource.sourceforge.net performs this simulation. However, its calculated probability is not the likelihood of winning, but what percentage of the pot will be won. Nevertheless it's a very good gauge for probability of winning, and every poker bot uses some form of a similar program. Once the data is cleansed and ready for evaluation, it's loaded into Weka and analyzed using the J48 algorithm (aka C4.5) with 10-fold cross-validation.

.
Results

The chart below shows the accuracy of classification given different number of attributes for the Hyperborean program.

This next chart below shows the accuracy of classification given different number of attributes for the SartreNL program.

Notice that both charts show two clear trends:


Future Work


Detailed Paper

For more details about our project, please refer to this final paper .


References

[1] T. M. Mitchell. Machine Learning, WCB McGraw-Hill, 1997.

[2] Ian H. Witten, Eibe Frank. Data Mining: Practical Machine Learning Tools and Techniques, Second Edition, Morgan Kaufmann, 2005

[3] Peter Bro Miltersen, Troels Bjerre Sorensen. A Near-Optimal Strategy for a Heads-Up No-Limit Texas Hold'em Poker Tournament. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69.3417&rep=rep1&type=pdf