Creating a Spaghetti bot

January 23, 2022

Every year Codecup organizes a contest for a game like chess, go, or othello. But instead of playing the games yourself, you create a bot which plays the game! And instead of a well-known game like chess, an unfamiliar game is chosen.

This year, the game was Spaghetti. I have competed previous year too, and this year I entered the competition with my bot called Carbonara. Here is my story about the challenges of Spaghetti and how I succeeded or failed in meeting them! I hope you gain some insight in how such bots work in general and, for the experts, some new tricks to use in your own bots.

Spoiler: I only got things working in the last few days of the competition, but still ended up at place 9 of 56.

The game Spaghetti

Twirling paths from side to side

In the game Spaghetti, the players (blue and red) take turns in placing tiles on a 9x7 board. The board starts with only two randomly placed tiles and continues until all fields are filled.

Possible tiles:

A Spaghetti game. Source: Codecup

The tiles combine to form paths, as you can see in the picture. You can only gain or lose points by completing a path. This can happen in four ways:

A path circles back to itself, creating a cycle. Cycles are worth -5 points.
A path goes from the top or bottom side to any other side. These paths are worth 0 points.
A path goes from the blue side to the blue side or from the red side to the red side. These paths are worth -3 points.
A path goes from the blue side to the red side. These paths are worth 1 point for every path segment from the last placed tile to the side corresponding with the player's color (blue or red).

A player's goal is to maximize their points. This creates a tension between cooperating to build paths from the blue side to the red side and competing to place the last tile in such paths. Although these rules are quite simple, there are a lot of edge cases to get correct. So I had my hands full fixing all the bugs, initially.

See the full rules or an example game for more details. One of the contestants (Boris de Wilde) created a page to play Spaghetti online.

How to write a bot for two-player games?

And what did I do in my bot Carbonara

To create a good bot — for any kind of game — you need three key elements:

A search algorithm. To choose the best move, you need to consider the available moves. To really know if a move is good, you need to simulate how the opponent plays. You usually do this by recursively applying your own play strategy, but I used another strategy for Spaghetti. The search algorithm determines how and in what order you search through the available moves.
The two most commonly used search algorithms are minimax and Monte Carlo tree search (MCTS), or variants of these. For my bot, I used iterative deepening depth-first-search, which is similar to minimax.
Minimax. Source: Introduction to AI, University of Helsinki, unmodified, CC BY-NC-SA 4.0
An evaluation function. For most games you cannot search through all moves, there are just too many of them. An evaluation function assigns a numeric value to each situation. That allows you to distinguish good and bad situations to end up in.
Evaluation functions typically use logistic regression, use a neural network, are hand-built, or use random playouts.
A game representation. With a good representation you can quickly get the information you need to assess the current situation. Example: what are the available moves, and how many points do they give? You typically have multiple choices for your game representation, so choose the one which fits your required information.

I won't go into the details, but you can find lots more about this on the internet. The mathematical field researching these problems is called game theory. If you want to write bots yourself, a useful resource is the Chess Programming Wiki. Game playing also relates to the field of reinforcement learning.

How will the opponent play?

To cooperate or to compete, that is the question

A lot of games are zero-sum. In zero-sum games anything one player wins is lost by the opponent, e.g. win vs lose. That makes creating bots a bit easier: you assume the opponent plays the worst move for you, and vice versa (this is called the minimax strategy). If the opponent makes a mistake or does not play as you expect, you come out better than expected. Ignoring those suboptimal moves never costs you anything, which makes it easier to create a strong bot. Most games are zero-sum, but not all of them: the prisoner's dilemma is the most famous non-zero-sum game.

As you might expect by now, Spaghetti is not a zero-sum game: the points you gain in the games are all summed up for the contest ranking, not the number of wins. Also, gaining points by completing paths does not mean your opponent loses those points. As a result, the opponent's points are not very relevant to optimize for. To illustrate this: imagine that you can choose if you gain 2 points or the opponent loses 2 points, for every game. In 20 games, that translates to gaining 40 points or 20 opponents lose 2 points. To win the competition, you should primarily focus on your own points. Thus, you cannot fully rely on your opponent's best move. After all mistake may mean you lose points instead of gain them, unlike with zero-sum games. There are two ways to handle this:

Play as if the game is zero-sum. By maximizing the point difference between you and the opponent, or by letting the simulated opponent only minimize your points (ignoring their own points).
Upside: playing does not leave you vulnerable to bad moves by the opponent. Downside: you can end up wasting points just to ensure the opponent does not get them.
Make a complicated opponent strategy, an opponent model. This strategy should take into account that you never exactly know what move the opponent plays.
Upside: if the opponent plays as you expected, you can gain a lot of points. Downside: if the opponent does not, you might never get the points you predicted to get.

I ended up choosing the second approach. I model the opponent as playing in one of three ways, each with equal weight. The first way is fully at random, the second is playing the worst move for me. The third way is “a good move” for the opponent, i.e. one of the 6 best moves or one which has a similar outcome to the best move. This opponent model also thinks I play in the same way. The idea is that I can exploit weak opponents, adjust well to strong opponents, yet don't make too many mistakes in creating opportunities for myself.

Is this a good approach? I'm not sure. I did well using this approach, but I saw enough games where my bot does obviously stupid things, naively assuming the opponent will not complete a path. Also, thinking more moves ahead didn't really improve the results. That's a bad sign, since thinking further ahead always improves results for good strategies.

Evaluation function

Because knowing a little is better than playing blind

At some point, there are too many moves to search through for the search algorithm. At that point you must estimate how many points you can get in the remainder of the game. For Spaghetti, you can just estimate 0 points, since you can score points before the end of the game. But you can do better with a good evaluation function.

For my evaluation function, I first computed some properties of the game position. I chose these properties hoping they provide useful information about how the game will play out from this point onwards. Examples of these properties: how many paths can still be made from the blue side to the red side, and how many of such unfinished paths are at immediate danger of being connected to the top or bottom side?

Two paths are possible from the blue side to the red side
Two paths are at danger of connecting to the bottom side
Source: Codecup

These properties are used as inputs for a simple neural network to get the evaluation function's output: how many points will I and the opponent still gain in the game. Is that neural network better than just a linear function? I don't really know, since I did not have the time to evaluate this. For the experts, this is the setup I used: 28 properties, one hidden layer with 10 nodes, leaky ReLU activation function.

To train this neural network, I used TD-learning. Only using the outcome of the game is problematic for training the early turns, since the outcome is not strongly related to decisions made in those early turns. TD-learning, a reinforcement learning technique, solves this by training early turns on the evaluation function of later turns.

Finally, I used quiescence search. Predicting the outcome of the game is hard if the next player can immediately get some points. That player will most likely choose that move, so the general properties of the game situation are less good for predicting the final outcome. With quiescence search you keep running your search algorithm, but only for moves that score points. The evaluation function will only be used for “quiet” situations.

How do I represent Spaghetti in my bot?

And its similarity to Donald Knuth's dancing link algorithm

In Spaghetti, the previously placed tiles are not important, except for the unfinished paths they form. So I want my game representation to do the following operations efficiently: follow unfinished paths, play and undo tiles, calculate the points you get from playing a tile, and getting all valid moves.

To do that, I track for every unplayed field where its adjacent paths go to, and how long these paths are. Let's look at an example:

Field 6:

Left: to 8's top, length 1
Top: to 3's bottom, length 0
Right: ...
Bottom: to 9's top, length 0

Field 8:

Left: to 7's right, length 0
Top: to 6's left, length 1
Right: to 9's left, length 0
Bottom: ...

As you can see, every unfinished path is kept track of at both its endpoint fields. Let's see what happens after playing a few more tiles, at fields 6, 2 and 3:

Field 6:

Left: to 8's top, length 1
Top: to 3's bottom, length 0
Right: ...
Bottom: to 9's top, length 0

Field 8:

Left: to 7's right, length 0
Top: to 4's right, length 5
Right: to 9's left, length 0
Bottom: ...

Field 8 contains the right information, but the information in tile 6 did not change. That's ok, since field 6 has a tile on it, so I can just not look at it. Yet, that information is enough to restore field 8's information when undoing the move! This allows me to efficiently achieve all my goals for the game representation. This smart trick is also used in Donald Knuth's Dancing Links algorithm X.

There is one remaining operation I want to do on the game representation: check if I have seen this position before. If so, I can reuse the calculations I did the first time I encountered the position (called memoization or a transposition table). For this, I just make a hash of all the unfinished paths on the board. This can also be efficiently calculated when playing and undoing moves, since at most four paths change when playing a tile.

Time management

When to think hard and when not

There is one relevant topic I haven't touched upon: time management. In programming bots, time is your currency. Use it for a slow but good evaluation function. Or use it to look more moves ahead in your search algorithm. The tradeoff is your choice, and approaches on any end of the curve can work well. And any time you improve your code and remove a bottleneck, your bot starts playing better, but another part becomes the bottleneck.

In the Spaghetti contest, you get 30 seconds for every game. It's up to the contestants how to make use of that time: you can often do better than using the same time for every move. For example, in the last few moves you can often simulate the complete game, so less time is needed. Also, in some games the first few moves have less impact, so you need less time for a good result. But sometimes the reverse is true, e.g. in chess.

For my bot Carbonara, I use at most 0.5 seconds for the first 10 moves (turn 1-20). Then I use at most 1.4 seconds for moves 11-27 (turn 21-54). The last few moves can do with 0.1 seconds. I did not experiment further, but this seemed a good balance regarding how many moves I can look ahead.

The maximum turn time

My last trick regarding time management is time prediction. This works as follows:

Start with looking one move ahead. Record the best move.
Predict if looking one move further ahead fits in the allocated turn time. If not, stop and actually play the best move.
Look ahead one move further. Interrupt the computation if it runs over the allocated turn time.
Repeat from step 2.

This approach avoids wasting time on computations I cannot finish within the turn time. Next turn, I can use this leftover time!

Competition result

What did all that effort get me

In the end, I ended up making changes up to the last day. So more time and effort, along with new ideas, might have brought me a better bot. Even so, I did quite well, finishing at place 9 of 56! On average, I gained 25.9 points in every game. The winner, Tapani Utriainen, gained an amazing 34.6 points on average.

My position in the ranking. Source: Codecup

Looking back at my bot, I could make improvements to any part of my bot. But, after looking at some games and discussing on the forum, I have come to the conclusion that the opponent modelling is the most important to improve. Well, that's a nice lesson for the next years!

Altogether, I really enjoyed taking part in this competition. I got to learn more about applying a learned evaluation function, along with TD learning. And the non-zero-sum aspect was a really interesting challenge.

Have you gotten interested in creating a bot too? Then you can join me in another year's Codecup competition! Also see my article about reaching second place in the 2023 competition.

Feel free to contact me at placeholder