Representative samples: the Holy Grail
Posted Apr 18, 2005 18:20 UTC (Mon) by
Max.Hyre (subscriber, #1054)
In reply to:
Linux wins on security in survey of 6,000+ software developers by jwb
Parent article:
Linux wins on security in survey of 6,000+ software developers
If I took anything away from my statistics courses, it's that the
absolutely hardest part to get right is sampling.
(Though figuring the right statistical analysis to use is close
behind.)
It's hard because you have to
- First, figure what your sample
population is: Sysadmins? Developers? CIOs? A mixture of them in
various proportions? Can you determine that subset which knows what they're
talking about?
- Then you have to figure how to generate a random
sample of the members of your population---not trivial.
- Next, how do you reach that set of the population? Always
have Dewey vs. Truman floating in front of your eyes. (They reached
their sample population [the electorate] by telephone, heavily biasing
it [in 1948] toward the well-off. For gory details, google for `truman
dewey poll'.)
- Finally, after doing a good job of all of the above, you have to
get your sample to respond to you. How many will be on vacation in
Lower Slobovia? How many pick up their voicemail, or look at their
e-mail, frequently, responding in time to do you any
good? How many will downright refuse to have anything to do with you?
Discarding these sample points, either by not counting them, or
choosing someone else in their stead, puts a real dent in
randomness.
So, just as you understand ``surf over here and answer some
questions'', or ``dial in to tell whether you prefer Princess Di or
Camilla'' polls to be nothing more than a form of entertainment, any poll
like BZ Research's has to be taken with many grains of salt.
The whole thing is dubious without clear description of all the
above criteria, analyzed by a knowledgeable, disinterested observer.
Look at research reports in Science or Nature to see the
sort of detail I mean. I'd bet a candy bar that the
``2.5 percentage points'' is nothing more than the number they
looked up in a table for a sample size of 6k.
And now, for some entirely-different bias, look no further than
the polls on the nightly news. They tend to be self-fulfilling prophecies:
``Well, if everyone feels like
that, why should I bother to vote / call my Senator / complain to the
Planning & zoning board?'' ``Hmmm, if no one's using Linux, I
should hold off.''
I hope I've loosened your faith in polls somewhat. :-/
(
Log in to post comments)