How pollsters are (and aren't) fighting fraud
Recent headlines about China deploying fake social media accounts to try to mislead voters ahead of the 2024 election caught the eye of one of our readers. “[I] started thinking about how, if you set up enough false accounts, the Chinese or the Russians [...] could conceivably manipulate seemingly legitimate poll results,” Rebecca McPheters wrote in an email to 538. It was an intriguing hypothetical, but we don’t have to imagine the worst-case scenarios to identify threats to polling — there are very real threats already well documented.
As it has become increasingly difficult to get respondents to pick up the phone and take a survey, more and more pollsters have been relying on other methods, such as text-to-web and online opt-in surveys, to gather data. But these approaches come with their own challenges and risks, including the possibility of fraudulent responses from click farms, professional poll-takers or nefarious meddlers. There are lots of safeguards pollsters can use to mitigate these threats, but as fraudsters get more savvy and tools like generative artificial intelligence become increasingly accessible, are they enough to keep polls accurate and secure?
Let’s start with the basics. Small monetary compensation is often offered in exchange for completing a survey. This attracts bad actors that include professional survey takers (who may not be completely honest or accurate in their responses, since they’re just trying to get through them as quickly as possible), and fraudsters automating responses to complete surveys en masse and rack up rewards.
While it doesn’t amount to a lot of money in either case, for individuals in some developing countries, it can be enough to make it worthwhile. InnovateMR and OpinionRoute, two marketing research firms, last year identified individuals who have fraudulently completed surveys for money in Venezuela and Bangladesh and interviewed them, learning they could earn between $200 and $2,500 a month.
“In isolation, it may not sound like a meaningful amount to the average person. But at scale, if they're able to find ways to navigate and break through companies’ defenses, they're able to not just take one survey, but a multitude of surveys — sometimes the same survey over and over and over again — to then scale and aggregate a larger incentive pool for themselves,” said Lisa Wilding-Brown, the CEO of InnovateMR.
Standard methods for securing surveys include simple checks like making sure a respondent's IP address matches their stated location, using “trap” questions to catch automated responses (such as “if you’re paying attention, please respond to this question with answer C”), evaluating how long it takes a respondent to complete a survey (if they finish what should be a 20-minute survey in 5 minutes, that’s a sign something is amiss) and looking for identical wording (or gibberish) in open-ended questions from multiple respondents. With the emergence of generative AI, however, fraudsters may be able to outsmart the tools put in place to keep them at bay, solving problems like trap questions and eliminating giveaways like repeat answers and gibberish. Other, more detailed analyses can also be used to weed out bad-faith responses.
“We will see a lack of knowledge about politics in general: people who say that they're extremely motivated to vote but can't answer questions that somebody who really is truly a likely voter should be able to answer,” said Ken Alper, the president of SurveyUSA. His firm also looks for “people who have inconsistent opinions,” he added. “You know, some of this is just catching people who aren't paying attention either. But, for instance, if you have a strongly favorable opinion of both Trump and Biden, chances are something's awry, and we're going to look a lot more carefully at every response.”
When it comes to political pollsters, though, one of the main techniques used to try to ensure they’re getting legit responses is fairly basic: cross-referencing with voter files. Pollsters often contact voters through the phone numbers associated with their voter file. Then, they can use the demographic information those respondents provide to verify that the person on the other end of the line is the voter they were trying to reach. Some pollsters are OK with speaking to another person in the household, but others are strict about only speaking to the intended voter. Either way, the political pollsters I spoke to all cited voter file cross-referencing as the main and best way they verify their results.
Pete Brodnitz, the founder and president of Expedition Strategies, a polling firm that has worked with political candidates, said he remembers distinctly the first time he came across an online opt-in survey during a Senate race he was working on. “A poll came out that had really squirrelly results, and I looked and realized, ‘Oh, I can just sign up myself. I can just completely make up my demographics.’ And that's ludicrous,” he said. “We realized in 2012 that if we didn't have the individual from the voter file, then it was impossible for me to know whether or not the sample I got was a representative sample.”
The problem is that none of these tactics — the IP checks, the trap questions, even the voter file matching — are foolproof, and there’s evidence that they may not be going far enough. Studies from the Pew Research Center, for example, have shown that online opt-in surveys are susceptible to large errors, particularly when surveying young adults and Hispanics. In one study, 12 percent of adults under 30 answered “yes” when asked if they were licensed to operate a nuclear submarine, and 24 percent of respondents who said they were Hispanic said they had such a license. In reality, fewer than 1 percent of Americans are licensed to operate a nuclear sub.
Pew also investigated a recent YouGov/The Economist poll where 20 percent of respondents under 30 said they believed the Holocaust “is a myth.” When Pew tried to replicate these findings with a more rigorous, probability-based mail survey, the results were wildly different: Just 3 percent of young Americans agreed with the statement denying the Holocaust. Pew has also found that online opt-in polls consistently include between 4 and 7 percent bogus respondents, introducing a measurable systematic bias.
Randy Ellison, the founder of Targoz Market Research, a public opinion and market research consulting firm, has written extensively about the shortcomings of the checks and balances put in place to weed out fraudulent survey respondents. He has found that, despite implementing industry-standard protections like trap questions, 30 percent of respondents in his 2020 polling were flagged when their IP address was checked with a fraud detection service. Fraudulent responses were also 30 percent higher in political polls than in consumer polls.
Ellison now checks all his responses for IP address fraud, something he said is common practice in the commercial polling industry. But while political pollsters use a voter-file-generated list to guard against the potential pitfalls of online opt-in surveys, Ellison said he’s not sure that’s a panacea, either.
“You're assuming that the cell phone number that's attached to that voter record is correct,” Ellison explained. But Ellison said he recently checked a list of 5,000 voter file respondents and found more than 90 percent had been “compromised,” meaning their information was available online.
Further work then needs to be done — checking to see if the phone numbers are active, and then cross-referencing the demographic information in the voter file data to ensure it matches — to truly verify the responses. “I feel like I've been shouting from the rooftops and nobody’s listening,” Ellison said. Though he said this is a common topic of discussion in commercial polling, “on the political side, nobody's really talking about it. It's a vulnerability to worry about.”
While voter-file-matched direct polling helps avoid many of the flaws of online opt-in surveys, the landscape is constantly evolving, and bad actors have more tools than ever before to trick pollsters. The best protection is likely a multi-layered approach, where no single guard is relied upon — similar to the “Swiss cheese” strategy in public health. Imagine a slice of Swiss cheese, with random holes where a fraudster can get through. But if you stack many slices of cheese, the holes get covered up by other layers, making it harder for someone to infiltrate and cause chaos.
This is the approach shrewd companies in the commercial sector are taking, according to Wilding-Brown. Those on the political side, where the stakes are even higher, may want to take note.