This manual is for the CLOP Pools Package (version 1.1, 26 January 2006), which calculates optimal picks for sports betting pools.
Copyright © 2006 Bryan Clair
Copying and distribution of this file, with or without modification, are permitted in any medium without royalty provided the copyright notice and this notice are preserved.
The CLOP Pools Package is a suite of command line utilities for working with sports betting pools. The package generally implements algorithms from the article Optimal Strategies for Sports Betting Pools (Clair,Letscher 2005), but also includes the main algorithm from March Madness and the Office Pool (Kaplan, Garstka 2001). The package and supporting information are maintained at http://euler.slu.edu/~clair/pools.
The model used for a betting pool requires three inputs:
Pick sets are coded as single lines of ASCII text, and are passed to the pools utilites on standard input, and produced on standard output.
Generally, the package is intended to play well with Unix utilities such as cut, paste, and sort, and the programs are designed to fit nicely into pipelines. The programs fall, fseek, fsmooth, fcanon, tgreedy, and tcanon all generate pick sets on stdout. The programs fstats and tstats calculate interesting statistics for pick sets provided on stdin. For example, the command:
fcanon -qd nfl_data | fstats -n50 nfl_data
calculates the expected return for a bet on all the underdogs in a 50 player football pool described in file nfl_data.
This package uses the GNU Scientific Library (http://www.gnu.org/software/gsl/) for numeric computations. The programs fall, fseek, and fsmooth are all threaded for speed on multiprocessor machines.
Football pools consist of g games which are assumed to be independent. The number of games is limited by the number of bits in an unsigned int. Running a 16 game pool on a machine with 16 bit ints has not been tested, and could potentially cause problems.
A football pool data file contains all actual and perceived probablities needed to model the pool. A line beginning with '#' is a comment and is ignored, as are blank lines. The file begins with a header record on a single line which is followed by one game record line for each game in the pool.
The header record gives text names for each columnn of fields.
Each game record has from 3 to 16 whitespace separated fields:
Here is a sample data file:
# Data from week 3 of the 2005 NFL season
HOME AWAY SAGARIN ESPN YAHOO
BUF ATL 0.60 .459 .448
CHI CIN 0.48 .214 .228
DEN KC. 0.42 .278 .274
GB. TB. 0.41 .341 .334
IND CLE 0.87 .975 .970
MIA CAR 0.63 .206 .152
MIN NO. 0.51 .509 .533
NYJ JAX 0.64 .527 .480
PHI OAK 0.82 .940 .935
PIT NE. 0.61 .709 .619
SD. NYG 0.47 .635 .720
SEA ARZ 0.80 .877 .860
SF. DAL 0.67 .140 .125
STL TEN 0.54 .756 .763
In this file, the columns are the probability of the Home team winning as predicted by Sagarin ratings, the probability that a given participant in ESPN's football pool chose the Home team, and the probability that a given participant in Yahoo's football pool chose the Home team.
By default, the 3rd and 4th column are used as the actual and perceived probability of Team 1 winning. To choose different columns, use the -A and -P options to the football programs. For example,
fseek -n150 nfl05_week3_data -PYAHOO
would search for the best picks in a 150 participant pool, using SAGARIN data as actual probabilities and YAHOO data as perceived probabilities.
Picks are read and written as a whitespace separated list of winners. Names must match exactly and be in the same order as the team names in the associated pool data file. For example:
BUF CHI DEN TB. IND MIA MIN NYJ PHI NE. NYG SEA SF. TEN
Programs to generate picks:
Programs for calculating statistics:
Calculate expected return for all possible picks. Writes 2^g lines of output. Each line shows the expected return for a set of picks, a tab character, and then the picks in the above format.
It is useful to pipe the output of fall to sort -nr to get a list sorted in descending order of quality.
Usage:
fall [-tqnAP] datafile
-q-t<threads>-n<competitors>-A<actuals>-P<perceiveds>Calculate expected return for all possible picks. Writes 2^g lines of output. Each line shows the expected return for a set of picks, a tab character, and then the picks in the above format.
fsmooth operates exactly the same as fall, except that the normal approximation is used to calculate the expected return for each set of picks. fsmooth is considerably faster than fall.
It is useful to pipe the output of fsmooth to sort -nr to get a list sorted in descending order of quality.
Usage:
fsmooth [-tqnAP] datafile
-q-t<threads>-n<competitors>-A<actuals>-P<perceiveds>Hill climbing search for a pickset which is a local maximum for expected return. The search begins with a good guess based off of typical results. The search ends at a pick set which has larger expected return than any other set which differs by at most two games.
fseek is very effective at finding the best pickset quickly. However, it might in theory become stuck at a local maximum which is not the global maximum.
Usage:
fseek [-tqnAP] datafile
-q-t<threads>-n<competitors>-A<actuals>-P<perceiveds>Calculate canonical picks for a football pool. Canonical picks include the favories, the underdogs, and the edge picks (and will display in that order if more than one are requested). Favorites and underdogs use the actual values, so if you want to see perceived favorites/underdogs, use the -A option. The edge picks are optimal for a sufficiently large number of competitors, and maximize the ratio A/P.
Usage:
fcanon [-qfdeAP] datafile
-q-A<actuals>-P<perceiveds>-f-d-eCalculate statistics for picksets read on standard in. The default behavior is to print the expected return followed by the picks.
Usage:
fstats [-qnAPsdgv] datafile
-q-n<competitors>-A<actuals>-P<perceiveds>-s-d-g-v<actual spread>A tournament pool involves picking all games of an R round single elimination tournament with 2^R teams. Currently, the maximum number of allowable rounds is 14 (which is ridiculously large).
With tournament pools, the scoring method is variable. In this version of CLOP, only two scoring methods are implemented: power-of-two scoring and ESPN scoring. In power-of-two scoring, correct picks are worth 1,2,4,8,... in increasing rounds. In ESPN scoring, correct picks are worth 10,20,40,80,120, and 160 points in increasing rounds. Any tournament program that uses scoring will use power-of-two scoring by default and accept the -E option to switch to ESPN scoring.
A tournament pool is described by three collections of data: team names, actual probabilties, and perceived probabilities. A collection of probabilities can be given in one of two ways, as head-to-head data or as winround data.
Within data files, team order is important, because it determines which teams play in which round (using the usual single elimination bracket) and it must remain consistent for all files used in a given pool.
In the sections below, T is the number of teams in the tournament and R is the number of rounds.
A team names file begins with a header line containing the keyword
names followed by the number of teams (T) in the tournament,
followed by an optional comment to the end of the line.
Each subsequent line contains a team name, which may optionally
be in double quotes. Quotes are useful to include whitespace in
the team name, which makes ASCII picks output much nicer.
Here is an example tournament with four teams. The first round matchups are Aardvarks-Bison and Chihuahuas-Ducks.
names 4 Bryan's Imaginary Playoffs
"Aardvarks "
"Bison "
"Chihuahuas"
"Ducks "
A head-to-head data file begins with a header line containing the keyword
h2h followed by the number of teams T in the tournament,
followed by and optional comment to the end of the line.
Data follows as T*T floating point numbers, in order:
P(0 beats 0) P(0 beats 1) .. P(0 beats T-1)
...
P(T-1 beats 0) .. P(T-1 beats T-1)
The data is redundant since P(i beats j) = 1 - P(j beats i). Values for P(x beats x) are required but ignored.
Here is an example that goes with Bryan's Imaginary Playoffs:
h2h 4 Close. Team 2 (Bison) have an edge.
.5 .4 .4 .7
.6 .5 .7 .6
.6 .3 .5 .6
.3 .4 .4 .5
A winround data file begins with a header line containing the keyword
winround followed by the number of teams T in the tournament,
followed by and optional comment to the end of the line.
Data in the file comes in two series, the solo and pair series.
The solo series begins with
the keyword solo followed by the probabilities
of team i winning round r for all i,r.
The pair series begins with the keword pair followed by
the probabilities of team i winning round r and team j winning round s for all i,j,r,s.
The pair series is optional. If it is omitted, the data no longer contains enough information for the theoretical model of the pool. In that case, CLOP will estimate the pair data and print a warning message to stderr. See the Optimal Strategies paper for details.
The solo series is size (T * (R+1)), in order:
P(0->0) P(0->1) ... P(0->R) P(1->0) ... P((T-1)->R)
The pair array is size (T * T * (R+1) * (R+1)), in order:
P(0->0 & 0->0) P(0->0 & 0->1) .. P(0->0 & 0->R)
P(0->1 & 0->0) .. P(0->1 & 0->R)
...
P(0->R & 0->0) .. P(0->R & 0->R)
P(0->0 & 1->0) P(0->0 & 1->1) .. P(0->0 & 1->R)
...
P(0->R & 1->0) .. P(0->R & 1->R)
...
P(0->R & (T-1)->0) .. P(0->R & (T-1)->R)
P(1->0 & 0->0) ...
P((T-1)->R & (T-1)->R)
Here is an example that goes with Bryan's Imaginary Playoffs:
winround 4 Team 2 (Bison) very strong.
solo
1.000 0.300 0.180
1.000 0.700 0.525
1.000 0.500 0.125
1.000 0.500 0.170
pair
1.000 0.300 0.180
0.300 0.300 0.180
0.180 0.180 0.180
1.000 0.700 0.525
0.300 0.000 0.000
0.180 0.000 0.000
(14x9 more floats)....
A set of picks for a tournament is stored in “depth format” as a list of integers in the range [1...R+1], one for each team. The number for each team indicates which round that team will reach.
In Bryan's Imaginary Playoffs, here is a bracket in which the Bison beat the Chihuahuas in the finals:
1 3 2 1
The tstats program can display brackets in a human readable ASCII format. The pix2tex utility can create a TeX file that displays the bracket graphically.
Programs to generate picks:
Programs for calculating statistics:
Utility programs:
Performs a hill-climbing search for picks that maximize expected return. Each trial chooses a random starting pick (uniformly distributed over the set of all possible brackets) and hill climbs to a local maximum. The process is repeated for the specified number of trials. Picks that improve on previous results are displayed when found.
Usage:
tseek [-nEqvts] namesfile actualfile perceivedfile
-n<competitors>-E-q-vtseek -v -t1 ... is a good way to get a feel for the
hill climbing process.
-t<trials>-s<seed>Display canonical statistics and picks for a tournament pool. The statistics (shown unless -q is used) describe opponent scoring. The six sets of picks are:
Usage:
tcanon [-Eq] namesfile actualfile perceivedfile
-E-qGenerate random picks. Each game is 50-50 unless the optional datafile is given to specify the probabilities.
Usage:
trandom [-nR] [datafile]
-n<count>-R<rounds>Calculate statistics for picksets read on standard in. After reading input, tstats produces a header with the comments from the input files and statistics describing opponent scores. Then, for each set of picks on stdin, tstats displays the picks in a human readable ASCII form and displays statistics for the picks. The statistics are:
expected returnactual probabilityactual mean score, actual score standard deviationperceived probabilityperceived mean score, perceived score standard deviationcorrelation with opponentsUsage:
tstats [-nEqsetP] namesfile actualfile perceivedfile
-n<competitors>-E-q-s-e-t-PQuick and dirty program to calculate the scores of picks on stdin, given
a set of picks as the input file outcome.
Usage:
tscore [-E] [-r rounds] outcome
-E-r roundsSimulate tournaments. Computes results for each set of picks Y read on standard in. Each trial chooses one pick using actual probabilities, then n competitor picks using perceived probabilities. After all trials are finished, the summary results for picks Y are displayed.
Usage:
tsim [-nEqst] namesfile actualfile perceivedfile
-n<competitors>-E-q-s<seed>-t<trials>Utility program to read in a probability file and dump a correctly formatted probability file in h2h format. Use for converting winround to h2h.
Usage:
dumph2h probfile
Utility program to read in a probability file and dump a correctly formatted probability file in winround format. Useful for converting h2h to winround (because the solo information is interesting for computer ranking generated h2h files).
Usage:
dumpwinround [-p] probfile
-pCalcualate a table of winround data from a list of picks.
Given a series of picks on either stdin or in picksfile,
computes solo and pair data by counting occurences of teams
reaching rounds. Dumps results to stdout as a winround format file.
This is how you get perceived probabilities if you have a large
collection of opponent picksets.
Usage:
pcalc [-r<rounds>] [picksfile]
-r<rounds>From a set of picks and a tournament names file, pix2tex generates LaTeX output to draw a filled in bracket.
Width and height are specified as floating point numbers and are used to position
the elements of the bracket. LaTeX will interpret these as points, by default,
although you could change \unitlength in your document to adjust this.
Usage:
pix2tex [-h<height>] [-w<width>] namesfile