How to win silver in the CodeCup
Every year CodeCup organizes a contest for a game like chess, go, or Othello. But instead of playing the games yourself, you create a bot which plays the game! And instead of a well-known game like chess, an unfamiliar game is chosen.
This year we had to create a bot for Entropy, where Order and Chaos compete, each with different move possibilities. Last year (Spaghetti) I finished at place 9 of 56, a quite good result. Yet for Entropy I exceeded my expectations by reaching place 2 of 75. In this article, I will show how I managed to do this. Besides a record of my efforts, I hope this article can guide you in future contests!
Contents:
- The game Entropy
- How to handle chance?
- Estimating the future with reinforcement learning
- What is important in Entropy?
- The neural network that did not work
- Competition result
- Appendix: Feedback from the CodeCup forums
- Appendix: How to deal with variation and uncertainty
- Appendix: on ideas, mistakes and making decisions
The game Entropy
In Entropy, the players take turns trying to achieve their competing goals. Order tries to create patterns and Chaos tries to prevent those patterns. The board starts empty in every match, but Chaos grabs a random chip from the bag every turn and places it on an empty field.
Example gameplay
Source: CodeCup
After Chaos places a chip, it is Order's turn. Order can choose to move one chip orthogonally or to skip their turn. For example, in the picture above Order can move the blue chip on Dd only to Dc, Cd, Bd or Ad. After that, Chaos takes a turn again. Once the board is full, the match ends and the points are counted.
Point calculation
Order scores points for the length of every orthogonal palindrome (a pattern of colors the same in both directions).
See the full rules or an example game for more details.
Deep dive: how to efficiently calculate points
To calculate the points, you just have to check all palindromes. Easier said than done, right? But this first insight helps: all palindromes are by definition symmetric around their center position. This center position is the middle of a field for odd-length palindromes, and between two fields for even-length palindromes. So we can just try all center positions and see how long each palindrome is:
points = 0 //Even-length palindromes //The palindrome's center is just after the field referred to //E.g. center = 2 implies that the center is between field 2 and 3 for center in [0, 1, 2, 3, 4, 5] { for length in [2, 4, 6] { left = center + 1 - length / 2 right = center + length / 2 //Left and right must be a valid field if left < 0 || right > 6 { break } //Left and right must have a filled field if colors[left] == empty || colors[right] == empty { break } //The colors of left and right must match if colors[left] != colors[right] { break } points += length } } //Odd-length palindromes //The palindrome's center is the middle of the field referred to for center in [1, 2, 3, 4, 5] { //Any color chip works for the center field, so we skip it for length in [3, 5, 7] { left = center - (length - 1) / 2 right = center + (length - 1) / 2 //Left and right must be a valid field if left < 0 || right > 6 { break } //Left and right must have a filled field if colors[left] == empty || colors[right] == empty { break } //The colors of left and right must match if colors[left] != colors[right] { break } points += length } }
That is a good implementation already, but we can do even better by using SIMD instructions. These allow us to compute all palindromes in just a few steps and no loops. To do, we need to change our code:
- to always execute the same thing, without any conditional branches; and
- to only load array data according to the index of the main loop.
points = 0 //Even-length palindromes for center in [0, 1, 2, 3, 4, 5] { still_running = true for half_length in [0, 1, 2] { //Move the fields half_length to either side // (We interpret the color array as a 64-bit number) //Corresponds with indexing colors[left] and colors[right] left = colors << (half_length * 8) right = colors >> (half_length * 8 + 8) //We replace the ifs and breaks with a boolean still_running = still_running && center >= half_length && center + half_length + 1 <= 6 && left[center] != empty && right[center] != empty && left[center] == right[center] //If the palindrome is still running, add the points points += (still_running * 0xff) & (half_length * 2 + 2) } } //And similar for odd-length palindromes
We now remove the inner loop by duplicating the code three times, and that makes the code SIMD-ready. The compiler can convert this to SIMD instructions in theory, but does not do this well enough here. By making good use of the Intel Intrinsics Guide, we manually make the code run for all center positions at once. Writing this type of code is hard, so it is useful to keep a simple version for (automatically) comparing results. From there, there are only a few things left to improve. For example, can you think of a way to combine the even and odd palindrome calculation with SIMD? Hint: make the even/odd loops as similar as possible, and see if you can handle the differences outside of the loop.
Now your code can calculate palindromes millions of times per second. A very smart trick I did not think of, but other contestants did: there are not
For making a bot, most aspects of the game are “standard”: it is a two-player perfect information zero-sum game. The branching factor and game lengths are manageable too. However, there are several interesting aspects to consider when making a bot. First, the game is asymmetric: Chaos and Order have completely different moves. That is not too hard to handle, except that it can lead to duplication of code if you are not careful (I was not that careful, unfortunately). Secondly, the game has an element of chance, by picking random chips from the bag. This has a large impact, since the commonly used minimax and Monte Carlo tree search techniques become much less efficient when handling this correctly.
How to handle chance?
One of the starting points for writing a bot is the minimax algorithm. This looks at all your available moves to choose the best one. For each move, it looks at all the following moves by the opponent and chooses the best one for them (worst for you). At some point, you stop, because you do not have enough computation time to evaluate the complete game. So you stop after looking ahead some number of moves, and use an evaluation function to say how good each game situation is (e.g. how many points have been scored so far).

Minimax. Source: Introduction to AI, University of Helsinki, unmodified, CC BY-NC-SA 4.0
Expectimax is the extension of minimax to include “chance nodes”. Whenever we have to draw a random chip from the bag, we just try all possible colors. There is no “best chip color”, of course, but we combine the results by adding them — weighted by the probability of drawing each color. That single change is all we need!
Unfortunately, if we stop here, we lose all the advantages of common minimax extensions, particularly alpha-beta pruning. I consider alpha-beta pruning an essential improvement, since it allows you to search more moves ahead. The key to alpha-beta pruning is that every time we ask for the value of a move, we also supply α — the best value already achieved by Order — and β — the best value already achieved by Chaos. If we encounter a move worse than α or β (depending on whose turn it is), we know that the moves leading to this situation are suboptimal. We stop evaluating this situation and save some time.
Ballard combined these two minimax extensions into
Mostly, because there are a few tricks to apply. Ballard himself proposed “Star2”, where you look at each color just enough to get an upper bound (upper bounds on chip colors translate into a higher α when evaluating other colors). That is a decent point to stop when making your own bot. There are a few other tricks (StarETC and ChanceProbCut) I applied, but these only have a minor effect.
Finally, you can search moves in a smarter order: if you search good moves first, you can increase α/decrease β and thus save more time. In particular, I first try the move stored in the transposition table (if found), then the killer heuristic and then all moves sorted with the evaluation function.
Deep dive: more thoughts on lower and upper bounds
Competitor Tomek Czajka posted a good idea on the CodeCup forums, which I have not seen described anywhere before. To appreciate this idea, we must first look at alpha-beta pruning from another angle. Instead of interpreting α and β as the best known moves, we can say they are
Visual representation of alpha-beta output in situation s₁, s₂ and s₃.
With this view, we can also run the alpha-beta algorithm (run a query) with different α and β than our best-known moves. Generally, running a query with a small
Coming back to
Let us compare this idea to Ballard's Star2. In Star2 we do “probing”: we try only one move of Chaos with a drawn color chip. Because Chaos could have chosen other moves, and Chaos wants to minimize the points, this results in an upper bound on the value. Tomek's approach will do the same with
I think this idea is genius, and potentially allows a large speedup. Thinking a bit more myself, we can generalize the idea of searching and lower/upper bounds. Currently,
Deep dive: on playing against weaker opponents
In the competition format of this year's CodeCup, you play a game against every opponent. To determine the final ranking, you sum up all your points scored as Order, and subtract all points scored by the opponent when playing as Chaos. As a result, it is important to not just win, but also to win by a large point margin. In particular, this means we can and must play better against some of the weaker opponents.
What do I mean with that? Well, by using the search algorithm above, we already take advantage of the mistakes the opponent makes. However, we assume that the opponent plays strongly in response. As a result, we do not take large risks, since we could lose more than we gain. Against weaker opponents, those risks
So let us detect weaker opponents. At first, I found this hard to do: the number of points scored so far was an unreliable indicator throughout most a match. The same for the outcome of the search algorithm. It seemed to me that chance played too large a role. But then I figured out a way to separate chance and good decisions: first I calculate the top move and score with my search algorithm, as if I am playing as the opponent (i.e. as Chaos), so using the same color chip the opponent drew from the bag. Then I compare that with the score corresponding to the opponent's move. The difference between these scores is due to skill, not luck. If you are familiar with multi-armed bandits: I am trying to estimate the opponent's regret. Summing these differences up during the match results in a very reliable indicator of opponent strength. When the opponent's strength is low enough, I switch to the optimistic algorithm below. If that turns out to be a mistake, my program switches back automatically later on.
Evaluating the strength of the opponent is difficult because it typically includes the effect of random colors.
After I (Order) move, the opponent (Chaos) draws a random chip from the bag.
Comparing the opponent's move with the best move using the same chip color is a reliable strength indicator.
Now we have to decide how to take more risks and play more optimistic. The ideal approach is to adapt my normal search algorithm to assume the opponent plays randomly. However, this is too slow, as good plans often take three or more Order moves — unlike the typical two moves needed against stronger opponents. So I settled on the next best thing, even more optimistic: I ignore the chips Chaos could potentially place. I only try moving chips — only Order moves — and hope the opponent does not block these moves by placing a chip on the wrong spot. Except in the end of the game, this works out very well! Moreover, it even works decently against the weakest opponents that do not play randomly. Nevertheless, to be safe, I added a small preference to play point-scoring moves earlier rather than later.
This technique regularly scores 20-30 points more than the normal search algorithm, thus contributing to my high ranking. Ideally, I always play well with my normal algorithm, but that was just not sufficient for Entropy and this competition format.
Estimating the future with reinforcement learning
We use our search algorithm (*-minimax) to decide what is the best move, and to estimate the value of the current game state. To do this well, we need a good evaluation function. The evaluation function gives a value to each game situation: assuming optimal play, what is the expected final game score? We can completely create this function by hand, but doing so is hard. Reinforcement learning can help us here!
Still, there is much to decide yourself. First, you need to decide which
Then we have to decide what the

Q-learning. Source: Reinforcement learning, Richard S. Sutton and Andrew G. Barto, unmodified, CC BY-NC-ND 2.0
I implemented a few improvements on top of default Q-learning, which improve the resulting evaluation function or speed up the training. First, I used multi-step Q-learning: instead of only looking one move ahead as with normal Q-learning, I let the search algorithm look two or more moves ahead. This does not always work in reinforcement learning, but works well in games because players act adversarially. The second modification is on which game states I train the evaluation function. Normally, you only train on the states you actually encounter when playing. However, when using the TreeStrap algorithm, you train on many the game states you encounter in the search algorithm as well. Instead of a single training step, you have thousands of (related) training steps for free, combining into a single really effective training step! A second benefit is that the search algorithm actually visits these states, so it is useful to predict them well.
The last thing to consider in reinforcement learning is your training curriculum: from which example matches do we want to learn? You need many matches of the game, i.e. only matches the test competitions is not enough, so you will need self-playing: your bot playing against itself or other bots you have. The matches need to be varied enough to learn from, but still representative of how you will play during the competition. In other games than Entropy, this can mean you need to play suboptimal sometimes, just to get this needed variation. For Entropy this was easy: drawing random color chips from the bag causes enough variation between matches already. Only the first few turns lack this variation, so there a low amount of random moves was useful.
What is important in Entropy?
As I said above, it is important to decide what the inputs of your evaluation function should be. This can simply be the state of the game, but that makes your evaluation function do
It is hard to give general advice, but here are a few guidelines to strive towards:
- Your inputs should have a (near) causal relation to the
output y . Pure correlations do not generalize well to unseen situations. An example: in Entropy, I fixed the coefficient for the current points to 1. This means that the past points (whether few or many) cannot influence the future points. Having many points makes you inflexible, because there are more ways to lose points, but I think that other inputs should capture this inflexibility instead.
One of the best inputs I used was the “largest threat”, since Chaosmust act to prevent these points. Often, there is no defence, or the moves unlock other opportunities for Order. Thus, there is a strong causal relation to y. - Your inputs should be as orthogonal as possible. If inputs are near-collinear it makes training harder, because it is hard to distinguish effect of the inputs. You can check the VIFs, anything below 10 should be fine, but lower is better. In practice, this means you should not have many inputs related to the same concept, but only a few. One useful trick for two related inputs: use the average and the difference of the two; this will be near-orthogonal!
- Continuous or many-valued inputs are better than binary inputs. Continuous inputs can give more nuanced information, which is useful in predicting better. Breaking up a continuous input into multiple binary inputs is more flexible, but usually not worth it.
- Think about how the gameplay changes during the game. Some inputs might not be important in the game opening, but turn out very relevant later on. If such things happen, it is useful to switch to different coefficients for every turn. This makes training harder, but can give better results.
For Entropy, I created the following inputs:
Example game state for the evaluation function.
Currently, it is Order's turn to play with 46 points scored so far. My evaluation function assigns 76.9 points to this game state, so 30.9 points can still be scored on average. The search algorithm gives the more accurate result of 75.4 points.
The corresponding best move is to move the orange chip on Fb to Fd, thereby unsurprisingly executing the largest threat of 6 points.
Example: one palindrome of length 4 can be created, six palindromes of length 5, none of length 6/7.
Example: the largest threat scores 6 points.
Example: the remaining threats score 7 points.
Example: there are three holes of size one, none of size two.
Example: there are there 18 chips on the outer ring (green), 10 chips on the ring inside it (blue).
Example: the distances to the center field sum up to 117.
Example: the two least played colors — magenta and black — each have four chips remaining.
Example: the number of fields reachable in two turns sums up to 88.
Example: the number of fields reachable in two turns without losing points sums up to 18.
The neural network that did not work
So, I also tried using a neural network instead of a linear function. After all, neural networks are used with much success in many domains, with AlphaGo as prime example. With the same inputs, I hoped to play a bit better too. I think I did everything right: I was using the Adam optimizer already for the linear function, I used one or two hidden layers, I tried different number of nodes and activation functions (ReLU and tanh), I standardized the inputs, and probably much more. However, it did not work out; it played worse than the linear function.
Still, I would like to be proved wrong, and learn how to work with neural networks in the future. If you are interested, please try it! The details on the dataset and challenge are below.
There are a few thoughts I have, but did not investigate. First is that the training process is unstable because I have per-turn weights. Second, maybe I just need much more data.
Challenge details: do better than my linear function!
Challenge dataset (zipped CSV's)
The dataset contains the following fields:
- test_train: “train” or “test”, indicating if this observation belongs to the training or testing set.
- gameid: an unique id per played game. Can be used to group the different turns of a single game.
- state: describes the state of the game. The first character (C or O) indicates whose turn it is. “_” indicates an empty field, 1-7 indicate placed chips (each number is a different color).
- turn: the turn number.
- minimax (training set only): the result of the search algorithm, using my own evaluation function.
- heuristic: the score assigned to this game state by my evaluation function.
- outcome: the outcome of the game, as it was played.
- points: the current amount of points.
- The other fields correspond to the inputs from the previous section.
The goal is to predict the minimax value as well as possible on the test set, using the same inputs I used, or (if you like) also using the current number of points or the raw board state. Designing new inputs to use is
I typically evaluate on turn 51, where my function achieves an R² of 89.3%. A regression directly on the dataset achieves an R² of 89.8%. Does your method do better? Then please contact me so we can see how well it works in actual play. If you send a method that works better than what I have (not only on R²), I learned something interesting, we had a good discussion, or you otherwise impress me, I will send you a packet of stroopwafels!
Contact me at placeholder, also if you just want to talk about this topic!
Competition result

Competition results
Source: CodeCup
I ended up in second place, a large achievement for me! I hoped to win, of course, but this result is better than I have gotten in any other year. To be fair, how much time you spend improving your bot is also crucial, and there is some luck involved in thinking of good ideas. Likewise, if the opponents also find good ideas and put in more effort, you will drop a few places. Nonetheless, I hope that reading this page can help you do better in the future.
Have you gotten interested in creating a bot too? Then you can join me in next year's CodeCup competition!
Feel free to contact me at placeholder
Appendix: Feedback from the CodeCup forums
When the competition finishes, there usually is a good discussion on the forum about the strategies everyone used. Here is a grab bag of things I found interesting:
- Most people used an expectimax variant, like me. Some people used Monte Carlo tree search, which has the advantage that you do not need to design an evaluation function.
- Many the competitors had the idea to precompute all possible patterns of a line. I did not, yet it would have made my program roughly twice as fast! Not sure this is total loss, since accepting the slow speed did make me consider good but slow-to-calculate inputs.
- Some people played “colorless” chips with Chaos, which never form palindromes. This makes the search algorithm faster, but a bit less accurate.
- A few contestants computed the average score of random playout, i.e. of filling the board with random color chips. This works well for a lot of games, so no surprise that it works for Entropy as well. I did think of this, but could not make it fast enough (without precomputed lines...). Experimentation shows that it would not have made my own evaluation function better, though.
- Many noted that ending the search with a move by Chaos did not help. Likely because the true effect of a chip placement is often only visible once Order moves it. I worked around this by allowing Order to move the just placed chip (only when stopping the search after Chaos' move). You can do this in roughly the same time as just evaluating Chaos' move.
Appendix: How to deal with variation and uncertainty
When repeatedly playing the game, there will be some variation in the outcome. When estimating playing strength of your bots, and of those of the other competitors, you need to take this variation into account. So it's always useful to compute a standard deviation and 95% confidence interval, showing you whether you might have gotten (un)lucky or if the difference definitely exists.
Using the competition results, I have modelled the playing strength of each bot. This model is imperfect, but a decent approximation, showing that the ranking could easily have been different.

Estimated playing strength, click to view large version.
Crashes and timeouts are ignored, thus ranking competitors differently.
Regression model: pair of matches outcome = strength[player1] - strength[player2] + error.
This has been a useful tool leading up to the final competition, together with plots of the strength playing as Order or Chaos (not shown). For example, I estimate that my bot plays relatively strong as Chaos, even making it the strongest Chaos bot (with large uncertainty though). Nevertheless, for the final competition we can do a bit better, since the final competition has been re-run: same bots, but new matches. With that, I calculate that I could have ended up higher than Tapani Utriainen with 22% chance, even though we had ~200 points difference! Therefore, I recommend playing more matches to counteract this uncertainty, in future competitions.
Appendix: on ideas, mistakes and making decisions
In general it is useful to gather ideas in the literature. Another useful resource is the Chess Programming Wiki. But once you have some ideas, how do you evaluate them?
I found that is important to focus on
Another thing that I cannot stress enough: look at your data! By making plots you see unexpected relations, you see wrongly computed values and you find opportunities for improvement.