Alan Zhao

Jan 16, 2018

Optimal Rugby Team Selection

After taking a couple optimization classes at the School of Management and School of Statistics, I've been thinking about problems from an optimization lens. One such problem I spend too much time on is picking lineups for my grad rugby team. The night before a game, I confer with the other leaders of the squad to determine what the strongest line up will be. We discuss various groups, mixing players in different positions and the talk usually takes an hour.

The decision is based off player ability and practice attendance, and rooted in our qualitative feelings. I began thinking of how the problem could be cast quantitatively and if so, if I could build a decision making tool.

An hour of research (ie google) showed that this is actually a long solved problem in computer science: the assignment problem. Simply put, it is the task of minimizing the cost of assigning n workers to m jobs, where each worker i for every job j has a cost(i,j). Turns out it is an common industry application. For example, how can Uber minimize total customer wait time given a set of drivers and available jobs?

Many solution methods for the assignment problem are out there, but the simplest is the Hungarian Algorithm, and there is a one line SciPy implementation already. As always, amazed by how impressive the Python open source data stack is.

My rugby problem restated is thus maximizing the total team performance through assignment of n players to m=15 positions, where each player has a positional score for position. This knowledge is largely implicit in our captains' discussion, so not too much more work to put it into a csv file. This file is the performance matrix: each player can play a subset at of the positions at varying levels (0 - not at all, 3 - basic knowledge and practice, 5 - years of varsity level positional experience).

alt text

The sum of values for the selected players is the total team performance metric, and maximizing this is the objective function.

Code

I wrote an object that stores the players and positions and automates the initial selection as well as reselection for any potential injuries. It keeps track of the team's total performance score as well.

import numpy as np
from scipy.optimize import linear_sum_assignment
import pandas as pd


class Selections(object):
    """An object to optimally select a starting team given a performance csv."""

    def __init__(self, data):
        """Read in data, csv needs to be to player-column and row-position. No need
        for duplication"""
        # fills  all empty elements with 0
        self.data = pd.read_csv(data, index_col=0).fillna(0)
        self._duplicate_cols()
        self.data = self.data.transpose()

        self.orig_cost = self.data.values*-1
        self.cost = self.orig_cost

        # retrieve list of players and positions
        self.positions = self.data.index.tolist()
        self.players = self.data.columns.tolist()
        self.starting_lineup = {}
        self.starting_score = 0
        self.current_lineup = {}
        self.current_score = 0

    def _duplicate_cols(self,
                       names=['Prop', 'Lock', 'Flanker', 'Center', 'Wing']):
        """Duplicate rows where there exist two spots on the field"""
        for name in names:
            second_position = name+'2'
            self.data[second_position] = self.data[name]
        # alphabetize columns
        self.data = self.data.reindex_axis(sorted(self.data.columns), axis=1)

        return

    def _create_lineup(self, rows, cols):
        """Returns a dictionary of positions keys and player values"""
        selections = {}

        for row, col in zip(rows, cols):
            position, player = self.positions[row], self.players[col]
            selections[position] = player

        return selections

    def pick_lineup(self, starting=True):
        """Solves lineup selection with hungarian algorithm"""

        if starting is True:
            self.reset()
            rows, cols = linear_sum_assignment(self.orig_cost)
            self.starting_score = self._team_score(self.orig_cost, rows, cols)
            self.starting_lineup = self._create_lineup(rows, cols)
            return self.starting_lineup

        else:
            rows, cols = linear_sum_assignment(self.cost)
            self.current_score = self._team_score(self.cost, rows, cols)
            self.current_lineup = self._create_lineup(rows, cols)
            return self.current_lineup

    def substitute_selection(self, player_list):
        """Remove a given player and reruns the selection from remaining player
        pool"""
        for player in player_list:
            player_index = self.players.index(player)
            self.players.remove(player)
            self.cost = np.delete(self.cost, player_index, 1)
        current_lineup = self.pick_lineup(starting=False)
        return current_lineup

    def reset(self):
        """Reset the selection object to its original state"""
        self.orig_cost = self.data.values*-1
        self.cost = self.orig_cost

        # retrieve list of players and positions
        self.positions = self.data.index.tolist()
        self.players = self.data.columns.tolist()
        self.starting_lineup = {}
        self.reserve_players = {}

    def _team_score(self, cost, rows, cols):
        """Display the team total score"""
        return cost[rows, cols].sum() * - 1

Example Use

In [1]:
from selections_optimizer import Selections
YGRFC = Selections("Rugby_Optimization.csv")
In [2]:
# get our starting line up
YGRFC.pick_lineup()
Out[2]:
{'Center': 'Kyle S',
 'Center2': 'Fish',
 'Eight': 'Alex W',
 'Flanker': 'Cody',
 'Flanker2': 'Lorenzo',
 'Fly Half': 'Tariq',
 'Fullback': 'Adam',
 'Hooker': 'Colin',
 'Lock': 'Cam',
 'Lock2': 'Kevin',
 'Prop': 'Samuel',
 'Prop2': 'Alan',
 'Scrum Half': 'John',
 'Wing': 'Jonas',
 'Wing2': 'Yodi'}
In [3]:
# get our initial team play value
YGRFC.starting_score
Out[3]:
61.0
In [4]:
# what happens when three players get injured and need subs?
# Alan and John get direct subs. 
# Algorithm moves Fish from center to fly half, 
# and brings in a sub for his old center position.
YGRFC.substitute_selection(["Alan", "John", "Tariq"])
Out[4]:
{'Center': 'Kyle S',
 'Center2': 'Dane',
 'Eight': 'Alex W',
 'Flanker': 'Cody',
 'Flanker2': 'Lorenzo',
 'Fly Half': 'Fish',
 'Fullback': 'Adam',
 'Hooker': 'Colin',
 'Lock': 'Cam',
 'Lock2': 'Kevin',
 'Prop': 'Chase',
 'Prop2': 'Samuel',
 'Scrum Half': 'Thomas',
 'Wing': 'Jonas',
 'Wing2': 'Yodi'}
In [5]:
# Team's total score goes down
YGRFC.current_score
Out[5]:
56.0

Conclusion

"all models are wrong, but some are more wrong than others."

This model attempts to get at very basic challenge of selecting teams, and does so. However, it misses many more complicating factors such as picking teams based off opposition (ie selecting for speed thematically or interaction between two players who play particularly well together). Furthermore, if there are multiple optimal teams it only shows one.

Many of these issue are feasible to be coded in, but that would probably cost me more time to refactor and rewrite. Plus the healthy debate among captains and coach in selecting and thinking through lineups is half the fun. Should be even more fun now that we have a decision making tool now.