2008 Election Predictions

Predictions Based on State-Level Pre-Election Polls

 

Analyses by

William F. Christensen & Alan Vaughn

Department of Statistics

Brigham Young University

 

Post-Election Analysis of Model Performance (based on news as of Jan 26, 2009)

 

Summary

 

  • Presidential Race:

PRESIDENTIAL

(electoral college)

Predicted

(most likely outcome)

Predicted

(expected value)

 

ACTUAL*

McCain Electoral Votes

185

190.3

173

Obama Electoral Votes

353

347.7

365

* based on current projections; MO still uncalled

 

PRESIDENTIAL

(state- and district-level predictions)

States awarding electoral votes

to statewide winner (out of 51)

Congressional districts awarding

electoral votes** (out of 5)

Correctly predicted

50 (all except IN)

4 (ME1, ME2, NE1, NE3)

Incorrectly predicted

1 (IN)

1 (NE2)

** district-level polling data unavailable; ME and NE award 1 electoral vote for each congressional district and two electoral votes for the statewide winner

 

  • Senate Races:

SENATE

(state-level predictions)

Correctly predicted*

34(all except MN)

Incorrectly predicted*

1 (MN)

* model predicted Coleman-R would win with a 1% margin in MN; after a re-count, Franken-D leads by less than 0.01%. The outcome is now in the hands of the courts.

 

 

Presidential Race Analysis

 

All 51 states will be correctly picked except for Indiana (which was won by Obama. Plots below show predicted and actual McCain vote share defined as % McCain / (%McCain + %Obama). A similar plot illustrates the Senate results.

Figure A. Predicted and actual vote share for McCain (%McCain / (%McCain+%Obama)).

Figure B. Predicted and actual vote share for McCain (%McCain / (%McCain+%Obama)), considering only the closest elections. Size of the font corresponds to the number of polls conducted in the state during the last 3 weeks before the election. As expected, states which were polled more frequently had more accurate predictions of the vote shares.

 

 

Senate Analysis

 

All of the called Senate races were correctly picked with the courts deciding the fate of the Minnesota Senate Race. Our model predicted a win for Coleman-R with a 1% margin, but the results from a recount indicate that Franken-D leads by less than 0.01%.

Figure C. Senate race results. Predicted and actual vote share for the Democratic Senate candidate (%Dem / (%Rep+%Dem)), considering only the closest elections. Size of the font corresponds to the number of polls conducted in the state during the last 3 weeks before the election. As expected, states which were polled more frequently had more accurate predictions of the vote shares.

 

 

 

Pre-Election Prediction of Daily Outcome Probabilities (analyses below were last updated on the morning of election day)

Motivation

Although much of the media attention during presidential election years focuses on polls tracking national popular support of the candidates, the complicated role played by the Electoral College in this multistage election process must be accounted for in order to address the issue of winning the presidency. Using state-level pre-election polls, we obtain probabilities associated with possible electoral college outcomes. These analyses consider the probability of various outcomes if the election were held today. That is, rather than trying to forecast election day outcomes, we are interested in tracking current trends in state-level voter attitudes and translate those current attitudes to a hypothetical election held today.

 

More info on methodology

More info on data sources

Daily Analysis (Presidential Race)

For each day's updated analysis, we simulate 50,000 elections using the accumulated state-level polling data.

 

Figure 1. Time series plot for the probabilities of a McCain win, an Obama win, and an electoral college tie (if the election were held today).

 

Figure 2. Probability of McCain winning each state (if the election were held today). The state labels colored purple are those states whose probability of going to McCain is between 10% and 90% (i.e., closely contested). From the left (states most heavily favoring Obama), the blue numbers above the bars denote the cumulative total of electoral votes Obama would receive if he won all states in which he has a higher probability of winning. From the right (states most heavily favoring McCain), the red numbers above the bars denote the cumulative total of electoral votes McCain would receive if he won all states in which he has a higher probability of winning. "ME1, " "NE3," etc. denotes the congressional district in Maine or Nebraska, where electoral votes are awarded by district.

 

Figure 3. Sampling distribution for "McCain electoral votes" based on 50,000 simulated elections (if the election were held today). The height of each bar represents the probability of McCain winning that many electoral votes. The bars to the left of the blue line represent simulated elections in which Obama wins (i.e., McCain receives less than 269 electoral votes). Similarly, the bars to the right of the red line represent simulated elections in which McCain wins (i.e., receives more than 269 electoral votes). The bar between the blue and red lines represents the probability of a 269-to-269 tie in the electoral college (requiring a tie-breaking vote among state delegations in the U.S. House in January of 2009).

 

 

Figure 4. Most likely electoral vote outcome for Nov 4 based on probabilities in Figure 2. As shown in Figure 2, North Carolina is very close to 50-50. (Graphic created using the Create Your Own Electoral Map tool at realclearpolitics.com.)

 

Figure 5. Time series plot for the expected number of electoral votes for McCain and Obama (if the election were held today). Plotted values represent the average number of electoral votes received among that day's simulated elections.

 

Figure 6. Probability of McCain win, Obama win, and electoral college tie if there is a Pro-Obama polling bias. Probabilities are calculated by decreasing the vote share spread by a fixed amount for each state and then re-running the simulation. If the spread is decreased by 6.7 points in every state, the probabilities for McCain and Obama wins are equal. Alternatively, the Bradley effect would have to account for a 6.7 point spread for the race to be even in terms of electoral college probabilities.

 

Daily Analysis (Senate Race)

For each day's updated analysis, we simulate 50,000 elections using the accumulated state-level senate race polling data.

 

Figure 7. Left: Simulated distribution for the number of Democratic U.S. Senators if the election were held today. Right: Probability of Democratic senate candidate winning in each state. The state labels colored purple are those senate races whose probability of going Democratic is between 10% and 90% (i.e., closely contested). From the left (states most heavily favoring the Republican), the red numbers above the bars denote the cumulative number of Republican senators if Republicans won all races in which they have a higher probability of winning. From the right (states most heavily favoring the Democrat), the blue numbers above the bars denote the cumulative number of Democratic senators if Democrats won all races in which they have a higher probability of winning.

 

States where the Democratic Senate candidate's daily vote-share estimates are most highly correlated with Obama's state-level vote-share estimates (since Sep 1, including only states where Senate race has been polled at least 5 times):

As of today, the Southeast contains the states where Obama's changes in the polls are most closely correlated with the Democratic Senate candidate's changes in vote-share estimates.

 

Methodology

 

Presidential Race. The statistical methods used for these analyses are described in detail in an article in the February 2008 issue of The American Statistician (Christensen and Florence, 2008). These analyses employ the Bayesian approach described therein, with a weighting scheme for polls which assigns full weight to a recent poll and half weight to a 17-day-old poll. During the last two weeks of the election, we slowly decrease the half-life of the poll down to 8 days (by Nov 3). The prior (beta) distribution for each state was constructed using the final vote shares for Bush and Kerry in the 2004 presidential election. (In Maine and Nebraska, where congressional districts award their own electoral votes, priors were obtained from district level vote shares.) The "prior weight" for the 2004 election data makes the prior as influential as a new poll with 150 respondents. Thus, for most states, the accumulated polling data quickly swamps the prior. Because this approach is based on data assimilation as opposed to forecasting, we make no effort to estimate or adjust for any possible non-sampling error (i.e., nonresponse bias, coverage bias, etc.). Thus, the accuracy of these results depends on how well the random sample obtained for each opinion poll represents a microcosm of the state's voters. In order to minimize the potential impact of nonsampling error, we use polling data from a wide variety of reputable polling organizations.

 

Senate Race. A similar approach is used. Prior distributions for the proportion of Republican/Democratic voters in each race are derived from actual party vote shares from the 2004 presidential election (with a prior weight as influential as a new poll with 10 respondents) and from the 2002 senate election (with a prior weight of 20 respondents if the incumbent ran against a viable candidate from the other party and a prior weight of 0 respondents otherwise). Thus, for most states, the accumulated polling data quickly swamps the relatively weak influence of the prior. For all analyses below, independents (Lieberman-CT and Sanders-VT) are included in the Democratic caucus.

 

Data

The state-level pre-election polling data used for these analyses were compiled using several online polling sources, the primary source being www.realclearpolitics.com, with additional polling information gathered at www.pollster.com and www.fivethirtyeight.com. No polling information was used that originated from partisan organizations or from mail-in or internet polling sites (such as Zogby Interactive). The first polls used in these analyses were the most recent state-level poll available as of 31 July 2008. Beginning 1 Aug 2008, polls were added daily to the database. For most states, the polling data used in the analyses were gathered on 1 August 2008 or later.