A few months ago, I sketched preliminary explanations of last November’s election; those conclusions still hold up well. This post addresses how well–or, poorly–the election polling did, why, and with what implications for using polls as a voice of popular opinion.
Putting the major polls together, their miss in last year’s presidential election was, on average, 4 percentage points, mainly because they underestimated the Trump vote; they also underestimated the Republican down-ballot votes by about the same margin. (Fivethirtyeight.com’s final averages of polls gave Biden an 8.4-point lead; he ended up winning by 4.4 points.) As presidential elections forecasting in recent decades go, this error was roughly average.
However, the 2020 polling stirred considerable and appropriate consternation; Politico declared the morning after that “the polling industry is a wreck and should be blown up.” The reasons for consternation include these:
* Although the polls got the electoral college winner right this time, the 2020 error was actually larger than the 2016 error, which was only 1.8 percentage points (Clinton was predicted to win the popular vote by 3.9 points, but won it by 2.1 points).
* This deterioration in accuracy occurred despite major efforts by polling organizations to fix the apparent 2016 problems and notable improvement in the 2018 off-year elections. The average 2018 error in forecasting party shares of the congressional vote was exactly zero. FiveThirtyEight.com declared that the “Polls are Alright.”
* In particular states (e.g., Wisconsin, Florida) the 2020 presidential polling error was much larger than the national 4 points.
* Many projections for down-ballot races, such as the Senate race in Maine, performed a lot worse than the presidential ones.
* The polls’ errors leaned in the same direction as in 2016, underestimating the Republican vote yet again.
Post mortems on the election now have some analysts and some political action groups (e.g., Swing Left) looking to rely less on polling going forward and more on “fundamentals” such as how a district voted in prior elections.
What happened?
What Went Wrong?
The polling industry will eventually circle in on the best explanations for the 2020 errors, figure out which tasks they failed at and why. Meanwhile, here are what appear to be the leading candidate explanations roughly in declining order of likelihood and importance. Obviously, more than one distortion could have been at work. Keep in mind that pollsters forecasting elections have to do three things successfully: draw a sample that represents (or can, with adjustments, be made to represent) the population of potential voters; assess which candidate the people who answered the poll are leaning to; and estimate the chances that those people will actually vote. Each step is precarious.
Alienated Trump Voters Didn’t Respond. In an era when only a few percent of the relevant population answer most non-governmental surveys, it is critical to understand how well the people who do answer represent the whole population. “Weighting” the actual respondents so as to simulate what a 100% response rate would have yielded is a key tactic for dealing with variations in willingness to respond. (As a simple example, if the population you intend to study is composed 50% of men, but men were more reluctant than women to participate so your sample of respondents ended up with only 35% men, you can–making a couple of brave assumptions–count each male’s answers [50 ÷ 35 =] 1.43 times to estimate what the results would have been if there had been no gender difference in responding.) If, however, there is an important characteristic of the non-responders that cannot be captured by gender, age, race, and the like, the weighting solution is much more difficult. One such characteristic is being the sort of person who is particularly distrustful, socially isolated, and uninvolved in the community. Such people avoid answering surveys more often than do people who may be like them demographically but alienated in this way. (See here and here.)
These missing respondents used to not matter much for election polling, because they also tended not to vote, or, when they voted, tended to break about evenly between parties. However, a trend emerged, even before Trump’s ride down the escalator: growing Republican skepticism about answering polls. Then in 2016 and even more so in 2020 many of the alienated and angry did come out to vote. And they voted Republican. The theory is that Trump appealed to a large strand of typically nonvoting Americans. He brought them out to vote for him and they generally voted the full GOP ticket as well. Indeed, Trump voters were likelier than Biden voters to cast ballots below the presidential level. At the same time, a small but noteworthy portion of regular GOP voters picked Biden at the top of the ticket. These developments help explain why down-ballot Republican candidates outran Trump and out-performed the polls.
A related explanation to the poll-averse Trump voter is that liberals, many of whom were at home during pandemic, were so enthusiastic about answering surveys that the polls over-counted them (see here and here).
Miscalculating Who Would Vote. This explanation overlaps with the previous one. Pollsters may have underestimated how likely the alienated who did answer the polls were to vote. (One predictor of whether respondents are likely to vote, for example, is whether they had voted in recent elections. But these folks typically had not voted recently.) More Republicans turned out on election day than the experts expected. Conversely, Democrats’ turnout may have been overestimated, because of they were so enthusiastically hostile to Trump when talking to pollsters (see here and here).
The Pandemic. Biden did worse than expected in places where Covid-19 was most virulent. Perhaps that contributed to Democrats being over-represented in the surveys and under-represented on election day. Or perhaps the pandemic mattered because Republicans continued with their door-to-door registration and get-out-the-vote efforts while the Democrats curtailed theirs as a public health measure. Similarly, perhaps some Democrats who could be expected to vote in a normal year did not turn out on election day because they were concerned about possible infection. (Republicans were clearly less worried about Covid; see, e.g., here and here.)
Late Deciders. Another plausible explanation is that the predictions were wrong because voters who were undecided on the last weekend before the election broke heavily for Trump by the time they arrived at the voting booths. Perhaps Trump’s flurry of rallies late in the campaign really did pay off as he claimed, but moved voters too late for the polls to capture.
Shy Trumpers? A common claim, made during both the 2016 and 2020 campaigns, is that some Trump supporters are reluctant to admit their preference for him even when they do answer a survey; they just lie. This issue was explored after the 2016 surprise and again in 2020. Most analysts think this pattern contributed, at best, only a little to the polling error, although a few continue to pursue it. Emily Elkins of the Cato Institute finds that highly educated, Republicans were much likelier than highly-educated Democrats to say that they censor themselves. And, Elkins argues, they probably distrust pollsters’ claims of confidentiality, so they don’t admit their Trump allegiance. Still, this is a minority view among the experts–so far.
Yet other explanations are out there, for example, that liberal media (say, MSNBC, The New York Times) were so confident of a large Biden win that many Democrats who follow those media didn’t bother to turn out; another is that the key error involved Latino voters’ movement toward Trump, one that was undetected by the polls.
Conclusion
The polling professionals will–once again–try to figure out what went wrong and how to fix it for the next election. Perhaps the urgency to repair is less than it was four years earlier because, although their errors were larger, the polls got the correct winner. The aggregators and predictors, like FiveThirtyEight.com or The New York Times, may, as suggested above, come to rely less on polls and more on measures like economic indicators, registrations, and voting trends to make their projections.
But predicting the horse race aside, the concerns for general surveys, those done to gauge public opinion, measure social trends, assess social theories, and so on, are less urgent. For such surveys one doesn’t need to estimate turnout likelihood, nor worry about late deciders, nor care about 3- or 4-point errors, nor deal with Trump.
The Pew Research organization conducted a serious analysis attempting to answer the question, How might the pro-Biden bias of the 2020 election polls have distorted findings about public opinion on important issues like taxes or trade? The answer was: hardly at all. Why not? Because the public includes millions of people who did not (and generally do not) vote but who still have opinions; and because, while there is a connection between people’s positions on issues like abortion or guns and the party they vote for, it is not a strong connection. For example, about one-fourth of Trump voters told Pew that the federal government should assure health care coverage for all Americans. In Pew’s analysis, the few issues that would have shown a difference in public opinion by as much as three points had the polls been accurate about Biden vs. Trump were issues that flared up in the election, notably race and Covid-19. But one would not have drawn different overall conclusions about Americans’ views on policy even if the polls had correctly captured the Trump voters.
Despite this analysis, survey researchers generally need to re-evaluate the classic random telephone interview; the surveys using that method tended to have the greatest error–perhaps because of the alienated non-responders. “Polls tend to overrepresent people interested and engaged in politics as well as those who take part in volunteering and other helping behaviors,” Pew reported. And that bias can affect research on topics far beyond who will win an election or opinion on politics, topics such as social connectedness, family life, and mental health. So, more is at stake than just winning election bets.
Update April 14, 2021
A group of liberal polling firms did their own post-mortem and here are a couple of key conclusions:
- . . . we found our models consistently overestimated Democratic turnout relative to Republican turnout in a specific way. Among low propensity voters—people who we expect to vote rarely—the Republican share of the electorate exceeded expectations at four times the rate of the Democratic share. This turnout error meant, at least in some places, we again underestimated relative turnout among rural and white non-college voters, who are overrepresented among low propensity Republicans.
- . . . there is something systematically different about the people we reached, and the people we did not. This problem appears to have been amplified when Trump was on the ballot, and it is these particular voters who Trump activated that did not participate in polls.
Update May 12, 2021
Data for Progress, a Democratic-aligned research outfit, also produced an analysis. Two of its conclusions are:
- Partisan Nonresponse and Activist Overrepresentation: Conservative white voters are opting out of polling, while liberal voters are disproportionately opting in, creating an underlying bias in our respondent pools. We also have evidence that liberal partisan activists are systematically overrepresented in our surveys.
- Geographic Heterogeneity in Respondents: There are substantial differences between urban and rural white voters in terms of likelihood of voting Biden in 2020 or switching from Trump to Biden. Respondents living in zip codes which display the most firm Trump support are less likely to respond to polls — even when you control for their partisanship and other demographics.
Update July 21, 2021
The American Association for Public Opinion Research issued its own analysis. Among the conclusions:
- The 2020 polls featured polling error of an unusual magnitude: It was the highest in 40 years for the national popular vote and the highest in at least 20 years for state-level estimates of the vote in presidential, senatorial, and gubernatorial contests.
- The polling error was much more likely to favor Biden over Trump.
- The overstatement of the Democratic-Republican margin in polls was larger on average in senatorial and gubernatorial races compared to the presidential contest.
- No mode of interviewing was unambiguously more accurate. Every mode of interviewing and every mode of sampling overstated the Democratic-Republican margin relative to the final certified vote margin.
What does not explain the error: *late-deciding voters voting for Republican candidates; *incorrect assumptions about the composition of the electorate; *respondents’ reluctance to tell interviewers they supported Trump; and *error in estimating whether Democratic and Republican respondents voted.
What might explain the error (it is impossible to identify the precise cause(s) of the polling error documented here without knowing the opinions and demographics of voters who were and were not included in polls.): *statements by Trump could have transformed survey participation into a political act whereby his strongest supporters chose not to respond to polls; *corrections that worked for 2016 did not work in 2020; * 2020 pre-election polls were not successful in correctly accounting for new voters who participated in the 2020 election.