How 538 is adjusting our election model for Harris versus Trump
On Thursday, Aug. 22, Vice President Kamala Harris officially accepted the Democratic Party's nomination to be its 2024 candidate for president. The following day, we turned on our election forecast for the race between Harris and former President Donald Trump.
But before you dive into the numbers, I wanted to explain how we've changed the model that powers that forecast. These changes address issues that became apparent while running the model for President Joe Biden and Trump in June and July. We have also adjusted our model to make sure it could properly handle cases when major third-party candidates, such as Robert F. Kennedy Jr., drop out of the race.
Our model now gives more weight to polls
At a high level, the version of the model we published before Biden dropped out of the race allowed for a dynamic rather than static, explicit weighting on the polls. When we launched the forecast in June, this was not an apparent issue; the model was generating estimates that were only slightly closer to our "fundamentals" indicators about the election — such as economic growth and whether an incumbent president is on the ballot — than to the polls, which had a lot of uncertainty in early June.
In July, however, the two sets of indicators started to diverge significantly. As the polls moved away from the fundamentals, the model did not react as expected to the new data and therefore generated estimates that deviated from the polls. (This divergence was possible because the old model did not explicitly dictate how much weight to put on polls versus fundamentals. More details on that below.)
Our model's overemphasis on the fundamentals stemmed from the way we explored future uncertainty in the election. In a nutshell, our old model worked by using the current day's polling average in every state as its starting point, then generating a forecast of what polls could say on Election Day by simulating random error in the race based on the amount of time remaining until the election. This allowed us to account for many different possible ways in which the election could unfold.
And when our model simulated random error for each state, it picked numbers that, on average, were supposed to pull the forecast slightly toward our fundamentals-based prediction of what the polling average would say on Election Day — with the precise strength of that pull determined by how much uncertainty we had about each indicator. (The more polls, or less time remaining until Election Day, we had, the more certain we were about the polls and the less we had to hedge toward the fundamentals.) The fundamentals were supposed to help steer us, in other words, amid a stormy sea of polls.
The model was technically working as designed (and produced expected results in back-testing) but ended up hewing more toward the fundamentals because of the complexity of these random errors. Basically, when we told the model to figure out how much polls were likely to change by November, instead of looking more like a bell curve centered around 0, it latched onto the fundamentals since that was the only other data source available to it. While this is how the complex statistics powering our model were supposed to work in theory, in practice, the model produced unusual results that did not lend themselves to easy explanations.
That's the "Reader's Digest" version of what was going on with our old model. However, I know a lot of you will be interested in the additional details, so a fuller explanation can be found in the appendix at the end of this article. For now, I will explain how we addressed these issues in our new model.
Our new approach: Averaging different models together
Our revised method is to run the fundamentals- and polls-based forecasts completely separately, then to compute our overall forecast by taking a weighted average of the two. The weights correspond to the (un)certainty we have about each forecast in isolation.
We accomplish this averaging by using something called Bayesian model stacking. In a nutshell what we're doing is running two separate versions of the 538 forecast, each producing 10,000 simulations of each party's vote share in each state. Then, we use the math explained in equation 13 here to estimate how much weight we should put on the polls-based forecast given how much time is left until the election — and thus how much uncertainty is left in the polling average relative to the fundamentals. The chart below shows an estimate of how much weight the model will put on the polls-only forecast given the day we're running the forecast and uncertainty in the fundamentals forecast.
So, for example, 200 days before the election, our new model would have taken about 35 percent of the simulations from our fundamentals forecast and 65 percent of the simulations from our polls forecast, given the 9-point standard deviation in the fundamentals model's forecast of party vote margins. At 69 days before Election Day (the day we're publishing this article), 81 percent of the full forecast is derived from the polls (the precise weights can change based on the amount of uncertainty in the polls and fundamentals). By Nov. 5, the weight on the polls will reach 98.5 percent, with 1.5 percent left coming from the remaining small amount of uncertainty we have about the level of support for each candidate in each state (but no volatility remaining in the polls — since it's Election Day). We still simulate random error in the forecast based on how much the polls themselves could miss by, but this does not impact the weight we place on the polls versus the fundamentals.
There are limits to forecasting
The process of updating our model has highlighted for me a big issue in forecasting elections: the small-N problem. See, with just 20 or so past presidential elections with enough polling data to train and test election models on, we can only have a certain degree of confidence that any single model will be the best predictor of the future. We think our new model sets us up well to address large industrywide polling misses or cases when historical variables (say, incumbency) stop being predictive of current elections, but even so, we do not claim to have the keys to a perfect forecast with laser-like predictive accuracy. Given the inherent measurement error in polling and somewhat unpredictable yearly changes in state-level election results, such a forecast is not actually possible.
Rather, instead of hyper-accuracy, our goal is to tell a faithful story about the state of the 2024 election. We do not intend to design a model that tells you, with certainty, that Trump or Harris is going to win between 45 and 49 percent of the vote in some state. What we prefer to do is publish a model that is roughly calibrated correctly so that events we give a 25-in-100, 50-in-100 or 75-in-100 chance of happening actually happen about that often. Empirically, based on how well it would have done in the 2000-2020 presidential elections, the new 538 forecast is well calibrated for this purpose. But historical back tests aren't the only reason we have confidence in this model; we also ran it for every day of the Biden-Trump race and the Harris-Trump race so far and verified that it moved appropriately in response to new polls.
For example, when we gave a candidate a 70-in-100 chance of winning the election in a given state, they won in reality about 74 times out of 100. The largest difference between estimated and actual probabilities was around the 30-in-100 mark; when we gave something a 30-in-100 chance to happen, it happened in reality just 22 times out of 100.
As an additional metric of model accuracy, we also calculate something called a Brier score, which scores predictions on a range from 0 to 1, with 0 representing the best and 1 representing the worst calibration from a forecaster. As a point of reference, someone randomly guessing the outcomes of every state over the long term would have a Brier score of 0.25. Our model's historic predictions have a Brier score of 0.033, matching the performance from expert forecasters, comparable quantitative political forecasts and political betting markets. But of course, our tests benefit somewhat from the advantage of hindsight, so this is not an apples-to-apples comparison.
The most we can hope for is that we publish a forecast that properly guides our audience's understanding of a complex world and captures the uncertainty inherent in elections, especially in polling. State-level polls have missed the final margin between the candidates by 4 points on average in the last two presidential elections. Uncertainty is the name of the game, and measuring it properly is our value-add.
With that in mind, we hope you find 538's new Trump-Harris forecast useful. If you have any questions or comments about it, you can always reach the team at polls@fivethirtyeight.com.
Appendix
The remainder of this article is for true modeling nerds. We are going to use phrases like "covariance matrix" and "Bayesian prior" — so if you're unfamiliar with those concepts, you might want to read our full methodology about the model first and then come back.
The details of our old model
I'd like to spend the rest of this article explaining in more detail how our old model was working. As an example, let's look at its final forecast of Pennsylvania on July 21. That afternoon, the model saw two main predictors of the margin between Trump and Biden in Pennsylvania on Election Day: (1) Trump's 3.9-point lead in polls of the state and (2) a fundamentals-based forecast that predicted Biden would win the state by 1 point.
The model also saw uncertainty about these values: Because the polls can move a significant amount between July and November, the model thought the final polling average on Election Day could be anywhere between Trump+18 and Biden+10 — equivalent to about 0.4 points of change, on average, per day. And because fundamentals are only roughly predictive of November results, the fundamentals-based prediction carried an even wider uncertainty interval: about 30 points on vote margin. To be sure, that is a very wide 95 percent confidence interval, reflecting the outside chance of a large election-to-election swing in the state and unpredictable effects of the political and economic environment.
It is easy enough to figure out what the model's combined prediction for one state (in this case, Pennsylvania) should be given just these two predictors and the uncertainty we have about them. Theoretically, each predictor gets a weight that is inversely proportional to its variance, and you'd end up with a combined median prediction that is closer to the polls (in this example, Trump+3.9) — since they have less uncertainty than the fundamentals (in this example, Biden+1) — and uncertainty that is less than either predictor's uncertainty on its own. Specifically, in this example, the polls should have received about 74 percent of the weight for the final prediction, while the fundamentals should have received 26 percent. The actual result of this calculation is a combined median prediction of Trump+2.6, with a standard deviation (about one-half of the full uncertainty interval) of 7.7 points.
In reality, however, our model was giving us a Pennsylvania prediction of Trump+0.3 with a standard deviation of 8.3 points. Clearly, when coming up with a forecast for Pennsylvania, our model was factoring in information other than the two predictors and their attendant uncertainty in that state alone.
This is where relationships between states came into play. Because our model calculated vote shares for all parties across all states simultaneously, it was not updating a fundamentals prior in Pennsylvania alone with polling data in Pennsylvania alone. Instead, it was also looking at differences in polling data and fundamentals priors in other states. This was by design; a good poll for Biden in Ohio should impact the prediction in Pennsylvania, and vice versa. In fact, the average correlation between states in our model was around 0.8 — meaning that if Biden gained ground in the polls relative to the fundamentals in one state by 1 point, the estimate was that he'd gain 0.8 points in all other states. (These correlations are stronger for geographically and demographically similar states and weaker for dissimilar ones.)
In other words, our model automated a statistical learning process. Specifically, we treated the value of each state's polling average on Election Day as an unknown parameter for the model to estimate. It computed those predictions by taking the present polling averages for every party in every state and adding some amount of movement. As mentioned in the main article above, that movement is designed to hedge the polling average today toward our prior for the polling average on Election Day — a prior that came from the fundamentals and the uncertainty we have about them.
How exactly you estimate that movement in every state, though, turns out to be very complicated, and somewhat more of a decision about programming rather than modeling. Our model did this by drawing numbers from a multivariate normal distribution where the amount of change for party A in state A is correlated both between parties (so an increase for Party A has to come at the cost of Party B) and across states (so if Party A does better in State C, it will also do better in State D). This means our computer was generating estimates for every pair from a 168-by-168 matrix (56 electoral jurisdictions* times three parties) for a total of 28,224 elements between them.
To be clear, this is not a lot of parameters in absolute terms. Our model also estimated the value of the polling average in every state for every party on every day of the campaign as a parameter, too. But there, the full range of possible solutions is very small; for example, if you have five polls clustered around Trump+10 in Florida, the model will be very hesitant to fit an average at just Trump+3. But when estimating the potential change in the race between now and November, we didn't have a lot of data. The only sources of information we had for Pennsylvania on July 21, for example, were a very uncertain polling average (with a lot of correlated movement added on top) ... and the fundamentals.
Because we ended up combining two models to make a prediction for November, we had to import a precise covariance matrix of state simulations from the fundamentals model to be used in the overall forecast model. But the rigidity of this matrix, contrasted with the uncertainty about the polls and overall sparseness of our problem, ended up forcing the model's overall estimates back toward the fundamentals more than intended.**
Think of it this way: Imagine that you're playing a game in which you have to guess where in the world a random picture was taken. You hold in your hand a picture of a stop sign in what appears to be an Eastern European language, but you're not sure. There's a large map in front of you, and you need to point to the place on the map where you think the picture was taken. Maybe you'd pick somewhere in Bulgaria and get lucky. Nice job!
But now, we are playing again, and I have placed 20 pins in the map around Latvia and Lithuania. These represent 20 guesses that I pulled randomly from other people playing the game online. Now it may be very hard for you, without any other pictures or information, to guess Bulgaria. So you pick a place in Latvia and call it a day. (The street sign is actually from Bulgaria. Sorry!)
Our model was doing something similar. Because it was very easy to sample simulations that matched the fundamentals model, and less so to pick numbers that matched the polls given the constraints and uncertainty above, the model hewed closer to the fundamentals than we'd have liked.
Pros and cons of the new model
Our new model has a few advantages and disadvantages over the old one. First, the drawbacks: An academic Bayesian would not want to build this model this way because they lose the ability to update the probability of each parameter with all the disparate data sources available to them. What this means is that our uncertainty intervals do not shrink as much as you'd expect when stacking the polls on top of the fundamentals, as they would in a normal Bayesian update. We also lose out on any model inferences about the reliability of polls in a state given the comparative differences in polls versus fundamentals in similar states. But as should be apparent to a reader who has followed along for this many details, modeling is a sophisticated science that, in the practical world, requires a lot of tradeoffs. With our new Harris-Trump model, we are trading off full statistical cohesion for interpretability and, to some extent, parsimony.
However, the new model has real advantages as well. In the future, our new model framework could give us the opportunity to easily combine predictions from many different versions of our model. We can imagine a world where 538 is combining not two but 20 or more plausible models of the polls and fundamentals, each with their own settings or approach to modeling the election. This would further guard against selecting a specification that broke down in the future.
Perhaps most importantly, however, the updated model has the advantage of interpretability. I can tell you exactly how much weight we're giving the fundamentals in Wisconsin on Aug. 1, for example, and I can be confident that the polling average today is not being unduly pulled in either direction by injecting the fundamentals into the forecasting component of that model. It's also less likely that an Election Day prediction for a given party in a given state will end up beyond the bounds of the fundamentals or the polling average, which was very confusing to see without a good explanation. Since the whole reason we do this is to help the public understand polls and elections, the added interpretability is a huge benefit.
Footnotes
*The 50 states, the District of Columbia, the two congressional districts in Maine and the three congressional districts in Nebraska. (As a reminder, Maine and Nebraska split their electoral votes by congressional district.)
**We also tested a version of the model that did not fit movement in the polls as correlated draws from a multivariate distribution, but rather a product of national shifts for a party plus some amount of random state-level error. This did not solve the problem.