Friday, October 1, 2010

Why I Criticize Newsweek's Polls

Update - The latest data from Gallup further illustrates why I dismiss Newsweek polls. Gallup finds that the GOP has 3 point lead among Registered Voters, but when Gallup models the data for Likely Voters that GOP lead balloons to 13 points. These polls are tied to an actual upcoming election as such a model (like Gallup's) that seeks to estimate voter behavior is useful, whereas a model that seeks to estimate the views of all registered voters - those who will vote as well as those who won't (Newsweek) - is meaningless.

I've enjoyed a very constructive e-mail conversation with Pollster.com's Mark Blumenthal today and based on that exchange I would like to make clear my problems with Newsweek's polls. Essentially it boils down to this - Newsweek is trying to characterizing how Americans will vote next month based on a sample of self-identified registered voters, weighted to match general population parameters, rather than on a more reasonable sub-sample or model of the likely electorate. As Blumenthal indicated in our exchange, and as most political scientists are aware, between 75% and 80% of all adults will self report as registered voters, but midterm turnout among eligible adults typically hovers around 40%. By focusing on the universe of registered voters, weighted to match the population, Newsweek's poll(s) actually tell us little about the upcoming (or any) election.

Allow me to explain my problems with Newsweek's polling methods:

The latest Newsweek poll, unlike all others, shows Democrats with a 5 point lead in the generic ballot - 48% to 43%. But if you look at the crosstabs on page 2 of the poll it shows that Republicans prefer a Republican by a margin of 93 to 4 and Democrats prefer a Democrat by a margin of 96 to 2 - but Independents prefer a Republican by a huge 47% to 30% margin... and yet Newsweek reports that Democrats enjoy a 5 point lead.

Based on the reported sample size of 305 Republicans (34%), 327 Democrats (36%), and 234 Independents (26%) this simply cannot be - in fact based on those numbers the poll should show a 44% to 44% tie. The data is then weighted based on Census data and the result is an impressive 8% point advantage for Democrats among registered voters in the sample - 39% Dem, 31% Rep, 26% Ind. Then there are another 4% or 36 respondents not mentioned much anywhere in the results, but clearly included in the calculation. If one assumes that roughly 55% of those 36 unaffiliated voters prefer Democrats then indeed you can arrive at the 48% to 43% number.

If you weight your data to turn a 2% turnout advantage for Democrats into an 8% advantage you're likely to find some good news for the Democrats. But is it realistic? The Newsweek numbers are akin to the advantage enjoyed by Democrats in 2008, an unlikely scenario in 2010.

I'm not questioning Newsweek's motives, but I am suggesting that the assumptions upon which they weight their data are flawed when trying to predict actual voting. As flawed as the recent Washington Post poll of the Maryland gubernatorial contest and as flawed as the recent surveys in California that showed Democrats surging. If you ignore past turnout models, if you dismiss evidence of strong GOP turnout in your own sample, and if you make a series of assumptions that work only to the benefit of Democrats then your polls will indeed show good news for Democrats - but will they be accurate?

Weighting by "gender, age, education, race, region, and population density" based on Current Population Survey data is not at all uncommon, but may not be the best predictor of actual voting. If I were interetsted favorite ice cream flavor I would weight by CPS, but not for voting. For instance, were a firm to survey Maryland voters and then weight the sample for known population parameters the survey would overstate the share of the electorate that would be African American or from Baltimore City and understate the share that would be Republican. Whereas a review of actual turnout data would show that the actual voters in Maryland are quite different from the universe of possible voters derived from the CPS.

This is why building a weight based on actual past voter turnout and exit polls from prior elections is more useful and appropriate. Can anyone seriously say that the electorate come November is going to be 39% Democrat to 31% Republican? That Democrats will match their 2008 performance? The fact that the initial sample was 36% Dem to 34% Rep should have been an indicator of voter intention and preference.

To quote from Alvarez and Nagler (2005) "If the known and observable attributes (say age, which we observe in the population and measure in the sample) are correlated with the political behavior we want to estimate (say the Kerry-Bush vote in the 2004 election), then if we have a sample that does not accurately represent the age distribution of the electorate adjustment to the age distribution of our sample can help improve the accuracy of our behavioral estimate. This is where reliable weighting methodologies can help improve the accuracy of our inferences; this is also where poor weighting methodologies can lead our inferences astray."

Newsweek ignores actual vote trends and turnout in favor of a methodology that will almost always result in a weighted sample that is more Democratic than was the initial sample and, based on their track record, the actual voters.

One may well ask "Why should an unweighted sample with known biases toward older, whiter, better educated and more affluent Americans produce a more reliable estimate of party ID than a sample weighted to match Census estimates of gender, age, education, race, region, and population density?"

A reasonable answer would be, perhaps because in the current electoral environment, and in fact in nearly every election, the electorate will in fact be be older, whiter, better educated and more affluent than the population of registered voters and especially the actual population of those eligible to vote. Weighting a sample based on the CPS so that 21% of the sample is aged 18-29, for example, would not improve the findings given that 18-29 year olds are only 18% of actual voters. Or weighting it such that those 65+ were 12% of the sample would not help when they are in fact 16% of voters. When the weights that one applies result in a party ID breakdown that appears to be clearly at odds with present conditions and past voting patterns then it may be time to reconsider those weights.

I understand that the Newseek poll is in fact in line with other national polls given the margin of error, but Newsweek "makes news" with its polls and when they are writing stories suggesting that the 2010 political terrain isn't really that bad for Democrats (based solely on their own polls) as they did this month and last then I think that it is only right to point out their methods have a long history of overstating Democratic turnout. An unmotivated Democratic voter reading the latest issue of Newsweek may well conclude, "gee, 2010 might not be so bad, I guess I don't need to vote."