Predictions Based on State-Level Pre-Election Polls
Analyses by
William F. Christensen
& Alan Vaughn
Department of
Statistics
Summary
|
PRESIDENTIAL (electoral college) |
Predicted (most likely outcome) |
Predicted (expected value) |
ACTUAL* |
|
McCain
Electoral Votes |
185 |
190.3 |
173 |
|
Obama
Electoral Votes |
353 |
347.7 |
365 |
* based on current
projections; MO still uncalled
|
PRESIDENTIAL (state- and district-level predictions) |
States awarding electoral votes to statewide winner (out of 51) |
Congressional districts awarding electoral votes** (out of 5) |
|
Correctly predicted |
50 (all except IN) |
4 (ME1, ME2, NE1, NE3) |
|
Incorrectly predicted |
1 (IN) |
1 (NE2) |
** district-level
polling data unavailable; ME and NE award 1 electoral vote for each
congressional district and two
electoral votes for the statewide winner
|
SENATE (state-level predictions) |
|
|
Correctly predicted* |
34(all except MN) |
|
Incorrectly predicted* |
1 (MN) |
* model
predicted Coleman-R would win with a 1% margin in MN; after a re-count,
Franken-D leads by less than 0.01%. The outcome is now in the hands of the
courts.
Presidential Race Analysis
All 51 states
will be correctly picked except for

Figure A. Predicted and actual vote share
for McCain (%McCain / (%McCain+%Obama)).

Figure B. Predicted and actual vote share
for McCain (%McCain / (%McCain+%Obama)), considering
only the closest elections. Size of the font corresponds to the number of polls
conducted in the state during the last 3 weeks before the election. As
expected, states which were polled more frequently had more accurate
predictions of the vote shares.
Senate Analysis
All of the called
Senate races were correctly picked with the courts deciding the fate of the
Minnesota Senate Race. Our model predicted a win for Coleman-R with a 1%
margin, but the results from a recount indicate that Franken-D leads by less
than 0.01%.

Figure C. Senate race results. Predicted
and actual vote share for the Democratic Senate candidate (%Dem / (%Rep+%Dem)), considering only the closest elections. Size of
the font corresponds to the number of polls conducted in the state during the
last 3 weeks before the election. As expected, states which were polled more
frequently had more accurate predictions of the vote shares.
Although much of the media
attention during presidential election years focuses on polls tracking national
popular support of the candidates, the complicated role played by the Electoral
College in this multistage election process must be accounted for in order to
address the issue of winning the presidency. Using state-level pre-election
polls, we obtain probabilities associated with possible electoral college
outcomes. These analyses consider the probability of various outcomes if
the election were held today. That is, rather than trying to forecast
election day outcomes, we are interested in tracking current trends in
state-level voter attitudes and translate those current attitudes to a
hypothetical election held today.
For each day's updated
analysis, we simulate 50,000 elections using the accumulated state-level
polling data.

Figure 1. Time series plot for the probabilities of a
McCain win, an Obama win, and an electoral college
tie (if the election were held today).

Figure 2. Probability of McCain winning each state
(if the election were held today). The state labels colored purple are those
states whose probability of going to McCain is between 10% and 90% (i.e.,
closely contested). From the left (states most heavily favoring Obama), the blue numbers above the bars denote the
cumulative total of electoral votes Obama would
receive if he won all states in which he has a higher probability of winning.
From the right (states most heavily favoring McCain), the red numbers above the
bars denote the cumulative total of electoral votes McCain would receive if he
won all states in which he has a higher probability of winning. "ME1,
" "NE3," etc. denotes the congressional district in

Figure 3. Sampling distribution for "McCain
electoral votes" based on 50,000 simulated elections (if the election were
held today). The height of each bar represents the probability of McCain
winning that many electoral votes. The bars to the left of the blue line
represent simulated elections in which Obama wins
(i.e., McCain receives less than 269 electoral votes). Similarly, the bars to
the right of the red line represent simulated elections in which McCain wins
(i.e., receives more than 269 electoral votes). The bar between the blue and
red lines represents the probability of a 269-to-269 tie in the electoral
college (requiring a tie-breaking vote among state delegations in the

Figure 4. Most likely electoral vote outcome for Nov
4 based on probabilities in Figure 2. As shown in Figure 2,

Figure 5. Time series plot for the expected number of
electoral votes for McCain and Obama (if the election
were held today). Plotted values represent the average number of electoral
votes received among that day's simulated elections.

Figure 6. Probability of McCain win, Obama win, and electoral college tie if there is a Pro-Obama polling bias. Probabilities are calculated by
decreasing the vote share spread by a fixed amount for each state and then
re-running the simulation. If the spread is decreased by 6.7 points in every
state, the probabilities for McCain and Obama wins
are equal. Alternatively, the Bradley effect would
have to account for a 6.7 point spread for the race to be even in terms of electoral college probabilities.
For each day's updated
analysis, we simulate 50,000 elections using the accumulated state-level senate
race polling data.

Figure 7. Left: Simulated distribution for the number
of Democratic
States where the Democratic
Senate candidate's daily vote-share estimates are most highly correlated with Obama's state-level vote-share estimates (since Sep 1,
including only states where Senate race has been polled at least 5 times):

As of today, the Southeast
contains the states where Obama's changes in the
polls are most closely correlated with the Democratic Senate candidate's
changes in vote-share estimates.
Presidential Race. The statistical methods used for these analyses are
described in detail in an article in the February 2008 issue of The American Statistician (Christensen
and Florence, 2008). These analyses employ the Bayesian approach described
therein, with a weighting scheme for polls which assigns full weight to a
recent poll and half weight to a 17-day-old poll. During the last two weeks of
the election, we slowly decrease the half-life of the poll down to 8 days (by
Nov 3). The prior (beta) distribution for each state was constructed using the
final vote shares for Bush and Kerry in the 2004 presidential election. (In
Senate Race. A similar approach is used. Prior distributions for
the proportion of Republican/Democratic voters in each race are derived from
actual party vote shares from the 2004 presidential election (with a prior
weight as influential as a new poll with 10 respondents) and from the 2002
senate election (with a prior weight of 20 respondents if the incumbent ran
against a viable candidate from the other party and a prior weight of 0
respondents otherwise). Thus, for most states, the accumulated polling data
quickly swamps the relatively weak influence of the prior. For all analyses
below, independents (Lieberman-CT and Sanders-VT) are included in the
Democratic caucus.
The state-level pre-election
polling data used for these analyses were compiled using several online polling
sources, the primary source being www.realclearpolitics.com,
with additional polling information gathered at www.pollster.com and www.fivethirtyeight.com.
No polling information was used that originated from partisan organizations or
from mail-in or internet polling sites (such as Zogby
Interactive). The first polls used in these analyses were the most recent
state-level poll available as of 31 July 2008. Beginning 1 Aug 2008, polls were
added daily to the database. For most states, the polling data used in the
analyses were gathered on 1 August 2008 or later.