Decision Tree Modeling of First Reactions in Heads-Up No Limit Hold'em Poker
Li Yang
Course: EECS 349 - Machine Learning
Term: Fall 2010
Professor: Doug Downey
Background
In Heads-up Texas Hold'em Poker, the tournament is played between two players, Player 1 and Player 2. When a round of play starts,
each player receives $20,000 in chips. The tournament consists of four hands of play. All chips bet by the players during a hand are
put into the pot, to be won by whoever wins the hand. A hand is initiated by each player receiving two cards, hidden from the other player.
A marker, known as the button, shows which of the players is to act first. After each hand, the button moves to the other player.
Each hand starts with the players making two forced bets, known as the small blind and the big blind respectively, with the player with
the button posting the small blind. Posting the blinds initiates the first of four betting rounds in following order:
preflop, flop, turn, river. In the preflop round, no cards are on the board. In the flop round, three cards are placed
on the board before betting starts. One additional card is added in the turn and river rounds before betting begins.
When betting, each player to act has three options:
- Fold and forfeit hand. The other player wins the pot.
- Call and match the number of chips bet so far.
- Raise and bet additional chips beyond what is needed for a call.
As suggested by the professor, a decision tree is a simple way to classify the response (check/call, raise, or fold) to an opponent's first action (check or bet) with the objective of winning as much money as possible from the current hand. Attributes from the current round of play that are considered include: opponent's first action (check or raise), amount of money opponent bets (for raise only), the probability of winning knowing your own hand and cards on the board (if any), and the probability of winning knowing the opponent's hand. Attributes from the previous round of plays that are considered include: percentage of calls that are raises, probability of winning knowing your own hand, probability of winning knowing the opponent's hand, and final amount of money each player has bet.
The purpose of all this effort is to find interesting patterns in how one player determines the appropriate response to an opponent's action. For instance, we're interested in observing the influence of prior round attributes on current round actions. One example: does the percentage of raises made by an opponent in the preflop round factor in the response of the flop round? To answer these questions, we'll use the data from a website (http://www.computerpokercompetition.org) that hosts annual poker competitions between automated bots. Specifically we'll observe the actions of the 1st place winner, the Hyperborean bot by the University of Alberta, and the 2nd place contestant, the SartreNL bot. If any interesting differences between their play is observed, that will be reported as well.
Implementation
The log data from the computer bot competition contains all information from every round of play including the hand of the opponent. An example is as follows:
- Hyperborean_iro|Tartanian4_tbr:0:b50b100r300c300/c300c300/c300c300/r600c600:4dTd|8cQc/3cAhKh/Th/3d:600|-600
The respective players are Hyperborean (little blind) and Tartanian4 (big blind). Before the flop, Hyperborean raises to
300 and Tartanian4 calls. Following the flop, they both check. Following the turn card, both players check again. Following
the river card, Hyperborean raises to 600 and Tartanian4 calls. We then see that Hyperborean had 4dTd in his hand i.e.
4 (diamonds) and 10 (diamonds), while Tartanian4 had a hand of 8cQc i.e. 8 (clubs) and Queen (clubs). The flop, turn,
and river cards are: 3 (clubs) & Ace (hearts) & King (hearts), 10 (hearts), 3 (diamonds). Thus Hyperborean wins with
two pair (10s over 3s) and gains the pot of 600.
There are literally millions of rounds of play by each bot. We'll only observe 50,000 rounds of play by the top two bots,
Hyperborean and SartreNL. A python program parses each line of data and extracts calculates relevant information needed
for each round of play as follows:
- Flop: % raises by each player (preflop), probability of winning when knowing own hand (preflop), probability of winning when knowing opponent's hand and own hand (preflop), final amount each player has bet (preflop), opponent's first action (flop), opponent's bet amount (flop), probability of winning when knowing own hand (flop), probability of winning when knowing opponent's hand and own hand (flop)
- Turn: all preflop and flop information, opponent's first action (turn), opponent's bet amount (turn), probability of winning when knowing own hand (turn), probability of winning when knowing opponent's hand and own hand (turn)
- River: all preflop and flop and turn information, opponent's first action (river), opponent's bet amount (river), probability of winning when knowing own hand (river), probability of winning when knowing opponent's hand and own hand (river)
The most difficult aspect is calculating the probability of winning given known hands and cards on the table. To do so,
a Monte Carlo simulation of possible future board cards must be done. Luckily an open source program from the site
http://pokersource.sourceforge.net performs this simulation. However, its calculated probability is not the likelihood
of winning, but what percentage of the pot will be won. Nevertheless it's a very good gauge for probability of winning,
and every poker bot uses some form of a similar program.
Once the data is cleansed and ready for evaluation, it's loaded into Weka and analyzed using the J48 algorithm (aka C4.5)
with 10-fold cross-validation.
.
Results
The chart below shows the accuracy of classification given different number of attributes for
the Hyperborean program.
This next chart below shows the accuracy of classification given different number of attributes for
the SartreNL program.
Notice that both charts show two clear trends:
- Attributes from prior rounds barely improve the accuracy of classification
- The later the round, the better the accuracy of classification
Future Work
- Add more attributes for evaluation
- Use Bayesian Network to determine current round actions based on prior round attributes
- Use more sophisticated modeling of winning probability
- More representative sampling
Detailed Paper
For more details about our project, please refer to this final paper .
References
[1] T. M. Mitchell. Machine Learning, WCB McGraw-Hill, 1997.
[2] Ian H. Witten, Eibe Frank. Data Mining: Practical Machine Learning Tools and Techniques,
Second Edition, Morgan Kaufmann, 2005
[3] Peter Bro Miltersen, Troels Bjerre Sorensen. A Near-Optimal Strategy for a Heads-Up No-Limit Texas
Hold'em Poker Tournament. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69.3417&rep=rep1&type=pdf