This article was adapted from Microsoft Office Excel 2007 Data Analysis and Business Modeling by Wayne L. Winston. Visit Microsoft Learning to learn more about this book.
This classroom-style book was developed from a series of presentations by Wayne Winston, a well known statistician and business professor who specializes in creative, practical applications of Excel. So be prepared — you may need to put your thinking cap on.
In this article
Can I use Excel to set NFL point spreads?
Many of us follow basketball, football, hockey, and baseball. Bookmakers set point spreads on games in all these sports and others. For example, the bookmakers’ best guess was that the Pittsburgh Steelers would win the 2006 Super Bowl by three points. (Boy, did the Colts choke in the divisional round! Of course, the Colts redeemed themselves by winning the 2007 Super Bowl.) How can you use Microsoft Office Excel 2007 to come up with team “ratings” that generate reasonable point spreads?
Using a simple Solver model, you can generate reasonable point spreads for teams based on the scores of the 2005 season. (Playoff games for this season occurred in 2006). Our work is in file Nfl2005.xlsx (see Figure 32-1). We simply use the score of each game of the NFL 2005 season as input data. The changing cells for the Solver model will be a rating for each team and the size of the home field advantage. For example, if the Indianapolis Colts have a rating of +5 and the New York Jets have a rating of +7, the Jets are considered two points better than the Colts.
With regard to the home-field edge, in most years, college and professional football teams, as well as professional basketball teams, tend to win by an average of three points (whereas home college basketball teams tend to win by an average of five points). In our model, however, we will define the home edge as a changing cell and have our Solver model estimate the home edge. We can define the outcome of an NFL game to be the number of points by which the home team outscores the visitors. We can predict the outcome of each game by using the following equation (which I’ll refer to as Equation 1):
(1)(Predicted points by which home team outscores visitors)=(Home edge)+(Home team rating)–(Away Team rating)
For example, if the home field edge equals three points, when the Colts host the Jets, the Colts will be a one-point favorite (3+5–7). If the Jets host the Colts, the Jets will be a five-point favorite (3+7-5). A complication that arose in the 2005 season is that Hurricane Katrina caused the New Orleans Saints (Team 20) to play games at a nearby site (San Antonio, Texas or Baton Rouge, Louisiana). (This is indicated by a C in column A.) We assigned 0.5 home-field advantage for the Saints in these games. Also Game 59 (San Francisco 49ers against Arizona Cardinals) was played in Mexico City (a neutral site) so no home-field advantage should be assigned for this game.
What target cell will yield "good" ratings? Our goal is to find the set of values for team ratings and home-field advantage that best predict the outcome of all games. In short, we want the prediction for each game to be as close as possible to the outcome of each game. This suggests that we want to minimize the sum over all games of (Actual outcome)–(Predicted outcome). The problem with using this target is that positive and negative prediction errors cancel each other out. For example, if we overpredict the home-team margin by 50 points in one game and underpredict the home-team margin by 50 points in another game, our target cell would yield a value of 0, indicating perfect accuracy, when in fact we were off by 50 points a game. We can remedy this problem by minimizing the sum over all games by using [(Actual Outcome)– (Predicted Outcome)]2. Now positive and negative errors will not cancel each other out.
Can I use Excel to set NFL point spreads?
Let’s now see how to determine accurate ratings for NFL teams by using the scores from the 2005 regular season. You can find the data for this problem in the file Nfl2005.xlsx, which is shown in Figure 32-1. Note that I’ve hidden the ratings of teams 8–18 so that the ratings and model would fit on one screen.
To begin, I named the range E3:E34, which contains each team’s rating, rating. I also named (for reasons that will soon become apparent) the range C3:E34 lookup2. I placed a trial home-field advantage value in cell G2.
Starting in row 39, columns C and D contain the team code number (listed in C3:C34) for the home and away team for each game. For example, the first game (listed in row 39) is the Oakland Raiders (Team 23) playing at the New England Patriots (Team 19). Column E contains the home team’s score, and column F contains the visiting team’s score. As you can see, the Patriots beat the Raiders 30-20. I can now compute the outcome of each game (the number of points by which the home team beats the visiting team) by entering the formula =E39–F39 in cell G39. By pointing to the lower-right portion of this cell and double-clicking the left mouse button, you can copy this formula down to the last game, which appears in row 304. (By the way, an easy way to select all the data is to press Ctrl+Shift+Down Arrow key. This key combination takes you to the last row filled with data—row 304 in this case.)
In column H, I use Equation 1 to generate the prediction for each game. The prediction for the first game is computed in cell H39:
This formula creates a prediction for the first game by adding the home edge to the home-team rating and then subtracting the visiting-team rating. Note that VLOOKUP(C39,lookup2,3) locates the home-team rating by using the home-team code number in column C, whereas VLOOKUP(D39,lookup,3) looks up the visiting team’s rating by using the visiting team’s code number in column D. (For more information about using lookup functions, see Chapter 3, "Lookup Functions.") The IF statement picks up the full home advantage in most games, 0.5*Home edge in Saints home games, and no home advantage for the Mexico City game.
In column I, I compute the error (actual score–predicted score) for each game. Our error for the first game is computed in cell I36 with the formula =G39–H39. In column J, I compute the squared error for each game. The squared error for the first game is computed in cell J39 with the formula =I39^2. After selecting the cell range H36:J36, I copied the formulas down to the bottom of our spreadsheet (H304:J304).
In cell J37, I’ve computed our target cell by summing all the squared errors with the formula SUM(J36:J304). (You can enter a formula for a large column of numbers such as this by typing =SUM( and then selecting the first cell in the range you want to add together. Press Ctrl+Shift+Down Arrow key to enter the range from the cell you’ve selected to the bottom row in the column and then add the closing parenthesis.)
It is convenient to make our average team rating equal to 0. A team with a positive rating is better than average and a team with a negative rating is worse than average. I’ve computed the average team rating in cell E1 with the formula AVERAGE(E3:E34).
I can now fill in the Solver Parameters dialog box as shown in Figure 32-2.
We minimize the sum of our squared prediction errors for all games (computed in cell J37) by changing each team’s rating (listed in cells E3:E34) and the home advantage (cell G2). The constraint E1=0 ensures that the average team rating is 0. From Figure 32-1, we find that the home team has an advantage of 3.5 points over the visiting team. Our 10 highest-rated teams are shown in Figure 32-3. Remember that the ratings listed in cell range E3:E34 were computed by Solver. In our template file we could start with any numbers in these cells and Solver would still find the "best" ratings.
Because the Colts are the highest rated AFC team and the Seahawks are the highest rated NFC team, we would have predicted that the Colts would play the Seattle Seahawks in the Super Bowl. Unfortunately, our Colts did not make it. Our data includes all playoff games for the 2005 season, so our prediction for the Super Bowl would have been the Pittsburgh Steelers over the Seattle Seahawks by 10.13–9.68=.45. There is no home advantage because the Super Bowl field was not a home field for either team.
Our model is not linear because the target cell adds together terms of the form (Home Team Rating+Home Field Edge–Visiting Team Rating)2. Recall that for a Solver model to be linear, the target cell must be created by adding together terms with the form (changing cell)*(constant). This relationship doesn’t exist in this case, so our model is not linear. Solver does obtain the correct answer, however, for any sports-rating model in which the target cell minimizes the sum of squared errors.
The file Nfl01.xlsx contains scores for every regular season game during the 2001 NFL season. Rate the teams. Who would you have forecast to make the Super Bowl?
The file Nfl02.xlsx contains scores for every regular season game during the 2002 NFL season. Rate the teams. Who would you have forecast to make the Super Bowl?
The file Nfl03.xlsx contains scores for every regular season game during the 2003 NFL season. Rate the teams. Who would you have forecast to make the Super Bowl?
The file Nfl04.xlsx contains scores for every regular season game during the 2004 NFL season. Rate the teams. Who would you have forecast to make the Super Bowl?
For the 2004 season, devise a method to predict the actual score of each game. Hint: Give each team an offensive rating and a defensive rating. Who had the best offense? Who had the best defense?
True or False? An NFL team could lose every game and be an above average team.
The file Nba01_02.xlsx contains scores for every game during the 2001–2002 NBA season. Rate the teams.
The file Nba02_03.xlsx contains scores for every regular season game during the 2002–2003 NBA season. Rate the teams.
The file Worldball.xlsx contains all scores from the 2006 World Basketball Championships. Rate the teams. Who were the best three teams?
Our method of rating teams works fine for football and basketball. What problems arise if we apply our method to hockey or baseball?