Playing Rings and Poles with HTNs

In this tutorial, we’ll see how we can create an agent that uses Hierarchical Task Networks (HTNs) to play the game of Rings and Poles.

We’ll be using the command line interface (CLI) as well as the Python library. Install and configure the CLI if you haven’t yet. You can also initialize a directory for the sync command if you want to use it. That’s the easiest way to get started creating content and uploading it to the platform. For an example of using the sync command, see the Create and Run a Simple Pipeline tutorial.

We will create three things in this tutorial:

a package with the action space and HTN methods for the game,
an agent that uses the package to play the game, and
a Python script that interacts with the agent to show the game play.

For now, we can put everything that isn’t Python or YAML into the same file. Let’s call it packages/rings-and-poles.cog.

We can start by declaring a package and domain. A package is a collection of domains designed to work together. A domain is just a collection of actions, events, behaviors, functions, and HTN tasks or methods that deliver a particular capability. How you divide up your code into packages and domains is up to you.

package com.ursafrontier.tutorials

domain games.rings-and-poles

Action Space

The action space is the set of actions and events that we use in the HTN. We can think of actions as things an agent will want to do, and events as the things that happen in the environment in which the agent is running.

When an agent runs, it will plan out a sequence of actions that it will propose. The environment (the Python script mentioned above in the case of this tutorial) will then decide whether to accept or reject the proposed actions, resulting in one or more events that are returned to the agent.

For our game, we want two actions. We can use PlaceRing to put a ring on a pole to set up the game, and MoveRing to move a ring from one pole to another.

Each of these can be successful or fail, so we’ll have four events: RingPlaced, RingFiledToPlace, RingMoved, and RingFailedToMove.

Each of these actions and events has parameters such as which ring and which pole. The parameter types are not enforced by the platform for now and are just for documentation purposes.

Generally, we use imperatives for actions and predicates for events, but this is only a convention and not enforced by the platform.

action PlaceRing {
    integer ring
    string pole
}

event RingPlaced {
    integer ring
    string pole
}

event RingFailedToPlace {
    integer ring
    string pole
}

action MoveRing {
    integer ring
    string from
    string to
}

event RingMoved {
    integer ring
    string from
    string to
}

event RingFailedToMove {
    integer ring
    string from
    string to
}

Behaviors

Behaviors represent the things an agent will react to in the environment. In our example, we can declare an event that acts as a trigger for an agent to start playing the game. In this case, we use an imperative to name the event because it’s acting as a command to the agent.

Each trigger in the behavior is declared using the when keyword followed by an event description. This could just be the event name, as used here, or an event with some of its parameters specified. For example, we could have when PlayGame(rings), rings > 3 { ... } to trigger only when playing the game with more than three rings.

All of the event parameters are available in the body of the trigger as local variables. In this case, we’re just passing the number of rings to play to the play HTN macro.

event PlayGame {
    integer rings
}

behavior PlayRingsAndPoles {
    when PlayGame() {
        play(rings)
    }
}

HTN Methods

We start by setting up the game and then moving the rings from one pole to another. We assume that the rings with a higher number are larger than rings with a lower number. We can imagine that the ring number is the size of the ring in centimeters.

play(rings) when rings > 0 {
    initialize-game(rings)
    move-rings(rings, "A", "C", "B")
}

This is an HTN macro, which is the same as a method except it’s not attached to a task. A macro and a mathod can have preconditions or guards (the part after the when keyword). If the guard evaluates to false, then the corresponding body is not considered when planning.

If we didn’t have a body (the part between the { and }), then this would be a task. A task may have preconditions and implications. These are not used by the planner, but they are used to document what the task expects and what any methods should do.

We aren’t using tasks or methods in this tutorial.

Game Initialization

We initialize the game by placing any rings that aren’t already on a pole on the first pole (A). This allows us to set up a game and give that setup to the agent when we create the agent instance, but the HTN we’re developing in this tutorial won’t know how to play the game if the rings don’t begin on the first pole. We leave it as an exercise to the reader to create a more flexible HTN that can handle arbitrary starting states.

There’s nothing to do if there aren’t any rings remaining to place.

initialize-game(rings) when rings == 0 { }

If there are rings remaining, we place the next ring on the first pole as long as it’s not already placed on a pole. The ring() predicate is true if the ring is on the given pole, and false otherwise. Predicates are part of the agent state. The implies ... clause changes the agent state. Implying a predicate marks it as true.

Once we’ve placed a ring, we recursively call initialize-game with one fewer ring to place the remaining rings.

initialize-game(ring) when ring > 0, not ring(ring, "A"), not ring(ring, "B"), not ring(ring, "C") {
    PlaceRing(ring=ring, pole="A") {
        RingPlaced(ring=ring) is success
        RingFailedToPlace(ring=ring) is failure
    }
    implies ring(ring, "A")
    initialize-game(ring - 1)
}

If the ring is already placed on a pole, we just recursively call initialize-game with one fewer ring to place the remaining rings.

initialize-game(ring) when ring > 0, ring(ring, "A") or ring(ring, "B") or ring(ring, "C") {
    initialize-game(ring - 1)
}

The planner will explore all of the possible ways to achieve a task. This means that if a method or macro has multiple bodies with guards that evaluate to true, the planner will consider all of them rather than the first one that matches.

Game Play

There are a number of ways to play the game. In this version, we assume that all of the rings begin on pole A and that we want to move them to pole C.

We are using the recursive implementation from the Wikipedia article linked above.

move-rings(ring, source, dest, temp) when rings > 0 {
    move-rings(ring - 1, source, temp, dest)
    MoveRing(ring=ring, from=source, to=dest) {
        RingMoved(ring=ring, from=source, to=dest) is success
        RingFailedToMove(ring=ring, from=source, to=dest) is failure
    }
    implies not ring(ring, source), ring(ring, dest)
    move-rings(ring - 1, temp, dest, source)
}

move-rings(ring, source, dest, temp) when ring == 0 { }

That’s all there is to the HTN definitions. Now we can create an agent that uses these definitions to play the game.

Agent

Defining an agent is pretty simple. We just need to give it a name and a list of packages to use. The agent will have available all of the domains in the listed packages. We can select which behaviors to use from those domains.

Put the following content into a file named agents/rings-and-poles-agent.yaml.

name: com.ursafrontier.player
packages:
  - com.ursafrontier.tutorials
behaviors:
  - games.rings-and-poles.PlayRingsAndPoles

Playing the Game

Now we can create a Python script that will start up an agent and show the agent playing the game. We’ll use terminedia to draw the game in the terminal. This isn’t as fancy as a web page, but it gets the job done.

Game Mechanics

Let’s create a class that can hold game information. This class provides methods to implement the placement and movement of rings. This is where we put the logic to make sure rules aren’t broken, but we aren’t putting any information about the action space here. This is a general-purpose class for playing the game.

class RingsAndPolesGame:
    def __init__(self, n):
        self.n = n
        self.poles = {'A': [], 'B': [], 'C': []}

    def place_ring(self, pole, ring):
        for set in self.poles.values():
            if ring in set:
                return False
        if ring > self.n or ring < 1:
            return False
        # if a lower disk is already on the peg, then we can't place this disk
        for d in self.poles[pole]:
            if d < ring:
                return False
        self.poles[pole].append(ring)
        return True

    def move_ring(self, from_pole, to_pole, ring):
        if ring not in self.poles[from_pole]:
            return False
        for d in self.poles[from_pole]:
            if d < ring:
                return False
        for d in self.poles[to_pole]:
            if d < ring:
                return False
        self.poles[from_pole].remove(ring)
        self.poles[to_pole].append(ring)
        return True

Drawing the Game

Next, let’s create a class that captures how we want to present the game while it’s being played. This class will use terminedia to draw the game in the terminal. We’re keeping it simple and redrawing everything each time rather than erasing and drawing the differences.

class GameViewer:
    def __init__(self, scr, game):
        self.scr = scr
        self.game = game

    def draw(self):
        self.scr.clear()
        self.draw_poles()
        self.draw_rings('A', 20)
        self.draw_rings('B', 40)
        self.draw_rings('C', 60)

    def draw_poles(self):
        n = self.game.n
        for i in [20, 40, 60]:
            self.scr.draw.line((i - n - 1, 20), (i + n + 1, 20))
            self.scr.draw.line((i, 20), (i, 20 - n - 1))

    def draw_rings(self, pole, center):
        rings = self.game.poles[pole]
        for i in range(len(rings)):
            self.scr.draw.line((center - rings[i], 20 - i - 1), (center + rings[i], 20 - i - 1))

Adjudicating Actions

We need to be able to tell the agent whether or not it succeeded in its action. We’ll use a simple function that takes a game and an action and returns any events that were generated by the action.

This is an example of translating between the action space of an agent and the simulation that the agent might be running in. In this case, the “simulation” is the game.

class GameAdjudicator:
    def __init__(self, game):
        self.game = game

    def adjudicate(self, actions):
        events = []
        for action in actions:
            if action['domain'] != 'games.rings-and-poles':
                continue
            if action['name'] == 'PlaceRing':
                if self.game.place_ring(action['params']['pole'], action['params']['ring']):
                    events.append({'domain': 'games.rings-and-poles', 'name': 'RingPlaced', 'params': action['params']})
                else:
                    events.append({'domain': 'games.rings-and-poles', 'name': 'RingFailedToPlace', 'params': action['params']})
            elif action['name'] == 'MoveRing':
                if self.game.move_ring(action['params']['from'], action['params']['to'], action['params']['ring']):
                    events.append({'domain': 'games.rings-and-poles', 'name': 'RingMoved', 'params': action['params']})
                else:
                    events.append({'domain': 'games.rings-and-poles', 'name': 'RingFailedToMove', 'params': action['params']})
        return events

The GameAdjudicator class takes the actions from the agent and applies them to the game state. It returns the list of events that the agent should receive in response to the actions.

We only care about actions in the games.rings-and-poles domain, so we ignore any other actions. In more recent versions of Python, we could use a match statement to do this.

Event Loop

One last thing before we tie everything together: the event loop that will run the game. This is a simple loop that gets the actions from the agent, adjudicates them, and then draws the game. It will run until the game is over.

def make_moves(drawer, running_agent, adjudicator, n):
    actions = running_agent.send_event('games.rings-and-poles', 'PlayGame', {'rings': n})
    while actions:
        events = adjudicator.adjudicate(actions)
        drawer.draw()
        actions = running_agent.send_events(events)

We can see that it’s pretty easy to work with the agent. We use send_event to send an event to the agent. This returns a list of actions that the agent wants to take. We then process the actions and send the resulting events back to the agent.

Putting it All Together

Now we have all the parts we need. Let’s put them together in a script that will play the game.

from ursactl.core.project import Project
from terminedia import Screen

def enter_number_of_rings_to_use():
    while True:
        try:
            k = input("How many rings do you want\nto use ? (3 to 9) ")
            n = int(k)
        except ValueError:
            continue
        break
    return n

def play_game():
    agent = Project().agent('com.ursafrontier.player')
    n = enter_number_of_rings_to_use()
    with Screen() as scr:
        game = RingsAndPolesGame(n)
        drawer = GameViewer(scr, game)
        adjudicator = GameAdjudicator(game)
        drawer.draw()
        with agent.run() as player:
            make_moves(drawer, player, adjudicator, n)

if __name__ == '__main__':
    play_game()