MORI’s most recent political monitor included a question asking about how people voted at the last election. Since they don’t use it for weighting purposes, it isn’t a question that MORI regularly ask (or at least, it isn’t one they regularly publish) and it’s a good opportunity to see just how much difference political weighting makes to a poll.
I mention political weighting in polls a lot here, but it’s been a long time since I’ve looked at what it is, why it is done and what difference it makes. In short all polls use methods that are supposed to generate representative samples, i.e. they have the correct number of people from each region in the country, they have the right spread of people in different age brackets, the right mix of men and women and so on. No method is perfect though, so weighting is used to use to iron out the differences. For example, amongst UK adults 52% of the population are female and 48% male. If you had a sample that was 55% female and 45% male you’d have too many women in your sample, so would would weight them down - specifially, you’d make every female respondent count as 0.95 of a person, and every man count as 1.07 of a person, then when you totalled everything up it would be the equivalent of having 52% women and 48% men.
Political weighting is more controversial and more difficult to do because it isn’t clear what the correct proportions are. On age and gender we have figures from the census so we know what the real demographics are. People’s politics we don’t - the best we have is the last general election. We know that in May 2005 around 33% of those who voted backed the Tories, around 36% of voters backed Labour and so on. In theory a pollster should be able to ask respondents how they voted in the last election and then weight the sample so it matches. The problem with this is “false recall” - if you take a panel study, i.e. ask the same group of people how they voted at the last election, and then ask them the same question 6 months later, and then another 6 months later, they should give the same response each time: we can’t, after all, go back and change how we voted. In practice though it has been tried, and the results do change over time. People who didn’t actually vote start pretending they did, people who voted tactically give the name of the party they really supported, people say how they’d have like to have voted rather than how they really did, people who voted for minor parties forget, and so on. Because past recall isn’t fixed in stone and changes in this way it arguably makes it unsuitable to be used as a weighting variable - we never know what the correct picture we should be aiming at is.
So why do all the main phone pollsters still do it? Because they think it’s better than the alternative of just doing nothing. When it comes to actual elections polls without weighting (or some other major adjustment) grossly overestimate the level of Labour support. Without political weighting, if you ask how people voted at the last election you will tend to get answers around CON 26%, LAB 48%, LDEM 17%. Given that this is 7 points below what the Conservatives got at the last election and 12 points above what Labour actually got it seems self evident that such a sample is grossly over-sampling Labour supporters and grossly under-representing Conservative supporters. Some of that discrepancy though is not due to a biased sample, but to false recall, so it isn’t as simple as weighting to the actual result of the last election. Instead the pollsters who use weighting by past vote need to estimate what levels of recalled vote a truly representative sample would produce, and then weight to that. Populus do this by assuming that the difference is roughly 50/50 between sample bias and false recall, and weighting to a point half way between the actual result and the average recalled vote in their unweighted samples. ICM do similar, but put the point closer to the actual results.
Weighting by past vote (or other political weighting) also has the advantage of stability - the make up of the political sample each month is, in theory at least, the same, so if Labour go up 4 points from last month we can be confident that they have actually gone up, rather than us just having a sample with more past Labour voters in it (within the normal bounds of sample error and so on of course).
So where does MORI come into this? MORI don’t weight by past vote because of the concerns about the volatility of past vote recall. They are concerned that past vote recall itself can change from month to month - ICM and Populus’s figures suggest that it is relatively stable over time, but that doesn’t mean it can’t shift in the future. MORI don’t normally use phone polling, they use quota sampling, so there is actually no reason to think their raw samples will resemble the phone samples used by ICM and Populus. Last month’s figures though suggest that they do - MORI’s sample hed recalled vote of CON 27%, LAB 47%, LDEM 19%. Populus’s last poll had unweighted figures of CON 29%, LAB 47%, LDEM 16%; ICM’s last unweighted figures were CON 27%, LAB 47%, LDEM 21%. As you can see, in terms of past vote, all three samples were pretty similar. The difference is that ICM and Populus then both weighted their samples to reduce the proportion of past Labour voters and increase the proportion of past Conservative voters so it was closer to what actually happened at the last election. Specifically, Populus weighted to shares of CON 32%, LAB 39%, LDEM 21% and ICM weighted to shares of CON 32%, LAB 39%, LDEM 22%. Hence ICM and Populus ended up using samples that contained far more Conservative supporters and far fewer Labour supporters than MORI’s sample did.
The topline voting intention figures published in the newspapers by the three pollsters weren’t that different - MORI and ICM both gave the Tories a 7 point lead, Populus a lower 4 point lead - though that was after Blair’s resignation. The reason for this is that MORI add a very strict filter by likelihood to vote - ignoring everyone who doesn’t say they are 10/10 certain to vote - which vastly boosts the Conservatives. Without that filter Labour would have had a 2 point lead.
Via various different adjustments and filters the pollsters all arrive at roughly similar figures for voting intention. The thing to remember here is the effect on all the other political questions - there is no filtering by likelihood to vote on things like approval figures for party leaders, or whether X or Y would make a good Prime Minister. So remember, when you are looking at MORI figures on David Cameron’s approval ratings, or which party would be best on pensions or whatever, they are the opinions of a sample in which around 47% of people who say they voted claim they voted Labour. When you see the same questions in an ICM poll they are the opinions of a sample in which only 39% of respondents who say they voted say they voted Labour. It’s also worth keeping a beady eye on quicky questions done by the phone pollsters on omnibus polls - in ICM and Populus’s monthly polls for the Guardian and the Times with questions on voting intention the sample will always have been weighted by past vote. In polls without voting intention questions they might not have been weighted as such, and they too might have rather more Labour supporters than you’d normally expect.
















14 Responses
Cheers for that thorough break down of the methods used by various pollsters.I have to admit to being slightly confused by the ‘weighting by past vote ‘ method .I’ve always assumed quite a few voters say they voted for the party who appear to be ahead in the polls at the time-regardless of how they actually voted, or even if they voted. However, you can’t argue with the stability of the results these methods tend to produce.
When did weighting by past vote and likelihood to vote become the norm? Surely at some point in the past a straight ‘who do you intend to vote for’ question delivered a fairly accurate result?
May 20th, 2007 at 9:49 pmBrian - that is one of the drivers suggested as an explanation for false recall.
ICM introduced weighting by past vote somewhere around 1993/4. Populus and YouGov have both done it right from the start, only having started polling around 2003 and 2001 respectively. Communicate started doing it just a month or two ago. MORI don’t do it.
1992 was the election when all the polls were seriously out, but they had been less than wonderful several times before. 1966, 1970 and October 1974 were all pretty poor performances. The difference was that only in 1970 and 1992 was the error great enough to predict the wrong result, and in 1992 everyone got it wrong, while in previous elections the poor performance of some pollsters was always covered by the better performance of others.
May 20th, 2007 at 11:58 pmAnthony,
Has anyone ever looked at comparing samples from a specific constituency with the actual vote to see if the false recall rates have a regional or party pattern other than a UK one.
Peter.
May 21st, 2007 at 1:19 amPeter Cairns:
>Has anyone ever looked at comparing samples from a specific
>constituency with the actual vote to see if the false recall rates
>have a regional or party pattern other than a UK one.
I wonder if there will be a problem with small sample sizes.
One thing the British Election Study does, which may have some analytical interest, is to verify whether respondents actually voted, using the electoral rolls. This will, at least, give you a measure of false-recall when it comes to whether they voted or did not. Since there is often a panel element to the BES (not a very long panel though) it may be possible to see who mis-remembers whether they voted (as well as who they voted for - assuming their post-election response was accurate).
Jon
May 21st, 2007 at 9:24 amI am still unsure whether the consistent erroneous figures for past vote recall is caused by false recall or sampling which consistently selects too many Labour supporters or is a mixture of the two .
May 21st, 2007 at 12:20 pmI note that the recent Yougov/Economist poll had party allegiance figures on the top line which showed a substantial excess of Labour supporters but where did they get these figures from .
By coincidence I took part in this survey and NO voting allegiance question was asked so where did Yougov deduce my party allegiance . If they know my past voting behaviour from previous polls why do Yougov need to weight for false recall at all as they should have info in their records of true past voting behaviour .
It is almost certainly a mixture of the two. We can be pretty much certain that some degree of it is false recall because we know from panel studies that people’s recollections change. It is likely that some of the difference is also bias samples, since without such adjustments polls would be biased in favour of Labour when compared to actual results.
The breaks on the YouGov poll would probably have been by party ID, not by past vote. Labour have a higher proportion of identifiers because there are plenty of people out there who say they identify with Labour, but actually vote Liberal Democrat and people who identify with Labour but don’t bother voting. People who say they don’t identify with any particular party at all are more likely to say they will vote Lib Dem or Conservative. The shares of party ID YouGov use to weight correspondend to the shares of party ID they found at the time of the last election in a sample that matches the actual result.
As you suggest YouGov don’t need to account for changes in false recall over time for most people, because it is stored on the database how people said they voted at the time of the last election (as it happened, they don’t weight by past vote anyway, they weight by party ID, but this too is taken from May 2005, rather than current surveys since it too will change over time).
May 21st, 2007 at 12:53 pmAn interesting and lucid analysis, but it misses two important points.
May 21st, 2007 at 2:46 pmThe first is that it is misleading to think of the reported past votes purely in terms of the shares of the different parties, without considering the percentage of abstainers. We generally find that a well-drawn sample includes about the right percentages of people claiming to have voted Conservative and Liberal Democrat, but too many saying they voted Labour and too few admitting they did not vote at all. But when you repercentage the figures to consider only the party shares, because there are too many people claiming to have voted, it makes it look as if the Tories and LibDems are under-represented in the sample, when this is not in fact the case. (In this case the misunderstanding is partly our fault for badly presenting the data, since the figures were mistakenly posted on our website with the party figures repercentaged, which should not have the case. We are correcting the presentation of the data on our website now. This change should appear shortly)
The second point is that it is simply not true that the ICM and Populus data shows that past vote recall is relatively stable over time. Look at the ICM/Guardian unweighted figures over the past year-and-a-bit: from January to April 2006 their Conservative recall averaged 16.3%, with all four polls at 16%+/- 1; from May to November the average was 19.0, with five of the six polls at 19% or 20% (tables for the June poll are missing from ICM’s website); from December 2006 to February 2007 it was 15% in all the polls followed by 14% in March. There are two absolutely clear step changes involved here. (Yet over this period ICM’s weighting targets for reported past vote were the same every month.) It is the possibility of changes of this type that make us believe that weighting by past vote is as likely to make polls more unreliable as the reverse, and that our samples are more representative (for the measurement of all questions, not just voting intention) without past vote weighting, which would also have made our polls less accurate during the last general election.
Thanks Roger. Sound point on the proportion of abstainers, I’ve repercentaged to include only voters for a couple of reasons, on a simple level I wanted to make the article as approachable as possible and it makes the figures easier to compare to the figures people are aware of (and the aim of this post was to say that different polls might have different samples, rather than argue one over the other), more importantly though I’m not sure of how Populus and ICM have approached dealing with weighting the didn’t votes, don’t knows and refusals so I’d need to do some more research before writing about it with any confidence.
Taking that into account and looking *very* briefly at the most recent ICM polls, they appear to find a greater proportion of didn’t votes, won’t says and don’t knows than in the most recent MORI poll (though that is a *very* brief look indeed), so perhaps the different approaches to sampling are producing different political balances after all, or perhaps there is a mode effect on false recall between phone and face-to-face polls.
You’ve looked at a longer range of ICM and Populus data than I did when writing the article above. It used to be pretty consistent and I haven’t been tracking them recently. For the article above I just took the averages from the last 6 months or so and that sustained leap up to 19 is something that had passed me by. Gradual changes in the level of false recall is something that ICM and Populus’ approach should be able to handle (that’s what it’s designed for) - sudden step changes could present a problem, so it looks like it’s worth a closer look.
May 21st, 2007 at 4:17 pmAnthony , perhaps it is me but I can’t make sense of the Party ID info given in the Economist poll . The unweighted sample not surprisingly has more Lab than Con ID supporters but the weighted sample rather than rectifying this increases the Lab to Con gap dramatically . I just don’t understand what they are saying here .
May 21st, 2007 at 7:51 pmMark - unweighted raw phone samples are normally “too Labour” and weighting makes then less Labour. In the case of online samples it obviously depends upon the make up of the panel and the make up of the people invited - one company’s panel might be more Labour or more Conservative than someone elses, but more often than not though YouGov unweighted raw samples are “not Labour enough” and weighting makes them more Labour.
I can’t stress enough that party ID doesn’t match voting behaviour. Obviously there is a strong correlation, but not everyone who says they identify with the Conservatives votes Conservatives, not everyone who says they identify with the Lib Dems votes Lib Dem, loads of people who say they identify with Labour don’t vote Labour (a whole swathe voted Lib Dem in 2005) and those people who say they don’t identify with any particular party (who makes up about a quarter of YouGov’s samples) are not non-voters who can be ignored, they are people who often do vote but don’t strongly identify with a single party…i.e. floating voters.
A YouGov sample will be weighted so that it is something like (and this is from memory so don’t quote the figures!) 25% Conservative *identifiers*, 34% Labour *identifiers*, 10% Lib Dem *identifiers*, 25% people who don’t identify with a particular party and so on. If you look at how that sample said in May 2005 they were *voting*, as opposed to who they identified with, it would pretty much match the actual election result.
May 21st, 2007 at 9:21 pmAnthony,
Could the weighting be the reason everyone underestimated Tory support on the Scottish constituency vote. I am still trying to figure outhow three polstersusing different methodology all underestimated it bythe same 3.6%.
Peter.
May 21st, 2007 at 10:43 pmIt seems to me that the vast majority of Scottish opinion polls seem to underestimate Tory support, for some reason.
Nearly every poll since 1999 has put the Tories in the 10-15% bracket, whereas in the 1999, 2003, and 2007 elections the Tories have won a greater share of the vote than this.
May 21st, 2007 at 11:01 pmAndy,
I don’t know quite how to check it, but it could be that it’s because the tories chalk up a lot of there votes in a handfull of seats in Scotland.
Ifwe assume a poll of about 1,000, then you would get about 140 votes in a poll of 14%. But spread over 73 constituencies thats only two votes per seat. If the tories were getting 20% of there vote from 10% of the seats it wouldn’t necessarily show.
The LibDems on the other hand, have a larger number of safe seats so it wouldn’t be so pronounced for them, and both Labourand the SNP tend to have a much more uniform spread.
What i might try to do is work out average vote per party for first and second place, plus average overall, to see if the Tory figure is very different from anyone elses.
Peter.
May 22nd, 2007 at 12:32 pmAnthony
Is one implication that a large part of the explanation (perhaps all of it) for the persistent over-representation of Labour support and under-representation of Tory support in polls is not people lying about voting Tory (as was once thought) but the portion of Labour supporters who tell pollsters they would vote Labour but when it comes to it don’t actually vote at all (for whatever reason).
Both methodologies are in a way adjusting for this phenomenon: either by weighting the poll to (actual) past voting or, in the case of MORI , by screening out the non-voters.
May 22nd, 2007 at 2:31 pm