Revisiting Polling for 2021 and Beyond
The results of the 2020 election have already changed the direction of the country. In less than three months, President Biden and his allies in Congress have “supercharged” the vaccine program, passed historic legislation to cut child poverty in half, and set our battered economy—millions of unemployed Americans, thousands of shuttered businesses—on a path to recovery.
It almost didn’t happen.
In fact, thanks to the quirks of the electoral college, the difference between a new administration and four more years of Donald Trump was merely 43,000 votes cast across Wisconsin, Georgia, and Arizona. As pollsters, the tremendous impact of such a small number of votes underscores the importance of continually questioning our assumptions and working to improve our methods to produce more accurate, reliable data. But in 2020, our industry saw major errors and failed to live up to our own expectations.
Together, we represent five survey research firms for Democratic political campaigns. During the 2020 election, we worked on the presidential campaign, every major Senate and gubernatorial race, and congressional races across the country. Our main job as pollsters is to provide campaigns with a strategic roadmap for winning, guide their messaging, and help identify the right targets for those messages.
Every one of us thought Democrats would have a better Election Day than they did. So, what went wrong?
Two weeks after the election, our firms decided to put competition aside to discuss what might have gone awry and to collaborate on finding a solution. There were several factors that may have contributed to polling error in 2020, and there is not a single, definitive answer—which makes solving the problem especially frustrating. In the sections that follow, we seek to explain what we’ve learned thus far in our ongoing efforts to “fix” polling, and what we still need to learn.
Diagnosing the Problem
Following thorough investigation into polling error in 2016, we all adjusted our weighting protocols to ensure we had enough white non-college voters, and polling seemed to improve in midterm and odd-year elections. But now, as we dig into 2020, we find an error that is not so easy to correct. We saw that in more Democratic states and districts, and some closely divided states like Georgia and Arizona, the data were quite good. But in more Republican areas, the data were often wrong, sometimes egregiously so.
Our initial hypotheses in the weeks following the election surrounded two sources of error: turnout and measurement. Turnout error is a miscalibration of the composition of the electorate—who actually showed up to vote relative to who a poll predicted would show up. Measurement error is a miscalibration of attitudes within groups beyond the margin of error inherent in every poll. Preliminary analysis suggests turnout and measurement both played a role in polling error in 2020, but measurement was probably the larger culprit in most places.
Turnout is one of the hardest things for pollsters to account for. Simply asking people whether they will vote is unreliable, because people tend to overreport their likelihood of voting as it’s socially desirable to do so. Instead, campaign pollsters use detailed databases of historical voting records for millions of individual voters to build a universe of what we believe will be the most likely electorate. This approach gets pollsters close to the mark most of the time but depends on the elections of the future looking like the elections of the past—and they often don’t, even when there is no pandemic to worry about.
Now that we have had time to review the voter files from 2020, we found our models consistently overestimated Democratic turnout relative to Republican turnout in a specific way. Among low propensity voters—people who we expect to vote rarely—the Republican share of the electorate exceeded expectations at four times the rate of the Democratic share. This turnout error meant, at least in some places, we again underestimated relative turnout among rural and white non-college voters, who are overrepresented among low propensity Republicans.
How to properly account for these voters in higher and lower turnout elections will be a challenge going forward. One solution many of us have experimented with is presenting data based on multiple turnout models, demonstrating how the results might change in different scenarios. We see it as a positive development that some public pollsters (like Monmouth University) have begun to try this.
Unfortunately, turnout error does not explain the entirety of the error in 2020. In fact, we found our pre-election turnout models were pretty accurate in the demographic composition of the electorate on key variables such as age, gender, race, and region. While thinking more smartly about the demographic composition of the electorate is essential, it isn’t the only solution. Some other source of error is in play.
The source of measurement error could be one (or more) of a million different things, and plenty of theories have already been discussed, privately and publicly:
- Late movement: This one was a popular theory after 2016, because exit polls and callback surveys suggested late-deciding voters broke overwhelmingly for Trump. In 2020, however, polls were incredibly stable, with hardly any undecided voters throughout the race, so this probably did not play a major role, at least at the presidential level.
- COVID error: Another theory points a finger at the pandemic, which disrupted nearly every aspect of traditional campaigning and undoubtedly had some impact on polling. Perhaps voters with more progressive attitudes on COVID-19 were not only more likely to wear masks and stay at home, but also more likely to answer our poll calls while conservatives remained harder to reach.
- Social trust: Related to the COVID-19 hypothesis is another popular idea that some voters are increasingly opting out of polls due to a lack of “social trust.” High quality social science surveys suggest Americans’ trust in each other has been falling for decades. As some analysts have suggested, Trump may have helped turn this into a problem for pollsters by attracting distrustful voters and making his most ardent supporters even more distrustful of other people, of the media, and perhaps even polling itself. That, in turn, could have made his supporters less likely to answer polls.
While there is evidence some of these theories played a part, no consensus on a solution has emerged. What we have settled on is the idea there is something systematically different about the people we reached, and the people we did not. This problem appears to have been amplified when Trump was on the ballot, and it is these particular voters who Trump activated that did not participate in polls.
Understanding these potential blind spots is challenging for obvious reasons, but we can glean some information based on the voter files and the types of voters we did or did not reach. By using modeled “scores” meant to predict individual level voter attitudes, we can see who might be missing from our polls attitudinally and not just demographically. Our initial analysis has discovered, for example, that we underrepresented voters who considered Trump to be “presidential.” We also slightly overrepresented voters who support the government taking certain actions to intervene in the economy and people’s lives. While the results are preliminary and the scores themselves imperfect, they give us some indication of where the root of the problem may lie, helping to inform our future work.
Where We Go From Here
How do we get people to participate in polls, if they won’t answer our phones, or respond to surveys online? We don’t have that answer yet. What we can tell you is that together, we are going to embark on a number of experiments over the course of this year. Along with progressive institutions who want to get to the bottom of this problem, we are going to put every solution, no matter how difficult, on the table. That means thinking about doing door to door polling the way it was done before there were telephones. It means thinking about recruiting folks to participate in research using paid incentives. And it means continuing to develop multi-modal research tools, knowing the old methodology of calling everyone on the phone cannot be the long-term solution to a society moving more and more communications online.
Finally, we have to realize polling error has been far more pronounced in presidential elections, especially those with Donald Trump on the ballot, so we should be careful not to overcorrect for an error that may be geared to one man who will, hopefully, never be on the ballot again. Conversely, we may also need to come up with new tools to measure and communicate about uncertainty to people who still have to make decisions based on the best information available. For all the flaws in polling and interpretation, it still provides a vital resource to campaigns and organizations—the questions polling answers don’t go away, and polling shouldn’t either (a self-interested point, we know!)
Our industry must figure out how to improve, and it is not going to be easy. Polling was very accurate in some places and inaccurate in others, and the explanation for why is not yet clear. We welcome ideas and collaboration with others from media pollsters to analytics experts to researchers in academia. We believe polling plays a critical role in our democracy and gives a voice to the American people. And we believe we can, and must, do much better.
ALG Research, Garin-Hart-Yang Research Group, GBAO Strategies, Global Strategy Group, and Normington Petts are leading polling and data firms in the Democratic space.