Survey analysis tool
Most statistical analyses of surveys, like
opinion polls or
market researches, are based on a nottrueinallcases assumption that a sample of 1000 satisfies the error margin of ±3% at 95% confidence level. Although the assumption is valid in most cases, it may be of vital importance to pollsters to know in which cases the assumption is wrong, and by how much.
Furthermore, a serious pollster should be aware of the fact that the
minimum error margin and the maximum confidence level can
not be chosen both in advance:
either one of the two can be chosen before the survey, while the other one can
be only computed after the outcome of the survey.
This page, however, is dedicated to the analysis of the survey results by means of discrete hypergeometric distribution.
The limitation the user may face is the time required to compute the results.
Although
one of the fastest algorithms
is used here, the computations become time consuming (order of tens of seconds) for the values of population size N above ten million. The upper limit for the computational time on this server is 30 seconds per case.
Input
Output
case 
M_{min} 
M_{max} 
p_{min} 
p_{max} 
 perc. 
+ perc. 
elapsed 
realistic 
0 
0 
0.00% 
0.00% 
+0.00% 
+0.00% 
0.000 s 
symmetric 
0 
0 
0.00% 
0.00% 
+0.00% 
+0.00% 
0.000 s 
optimistic 
0 
0 
0.00% 
0.00% 
+0.00% 
+0.00% 
0.000 s 
pessimistic 
0 
0 
0.00% 
0.00% 
+0.00% 
+0.00% 
0.000 s 
Gaussian 
0 
0 
0.00% 
0.00% 
+0.00% 
+0.00% 
0.000 s 
Description of input data
types:

n, m, N = integers, r = float

requirements:

m≥0, m≤n≤N, 0≤r≤100%

test analysis:

n=500, m=100, N=123456, r=99%

Description of output data
case:

Five different confidence intervals that can be claimed with the prescribed confidence level:
realistic (tails cutoff), symmetric (+/ percentage with respect to expected value mN/n), optimistic/pessimistic
(extremes of biased interpretations), and approximation with Gaussian distribution.

M_{min}, M_{max}:

Lower and upper value of confidence interval according to the case.

p_{min}, p_{max}:

Confidence interval in fractions of population size, i.e. values M_{min} and M_{max} divided by the population size N.

+/ perc.:

Deviations of M_{min} and M_{max} values from the expected value mN/n in percents.

Examples
Opinion poll analysis
Imagine 700 people have been randomly chosen and asked about their preferences among the candidates A, B, and C. The candidate A gets 20% of the votes.
What can we deduce from the poll results, if the population size is 1.6 million, and we want to be 99% sure? We feed the tool with the input data n=700, m=140, N=1600000, and r=99%.
The output gives five cases. Let us consider the realistic case: there is 99% probability that candidate A has between 261744 and 386102 votes,
which is between 16.36% and 24.13% of the total population. Deviations from the expected value of 20% are 3.64% and +4.13%.
Prevalence of a disease
Around Y2K an outbreak of mad cow disease occurs. The authorities in many countries want to know what portion
of their animals is infected. Let us assume that in the population of 200000 animals they test a random sample
of 3000, and find zero positive cases. The input data are n=3000, m=0, and N=200000,
for confidence level we choose 95%. The authorities are interested in the pessimistic case of the results: the lowest
estimation with 95% probability. From the output table we get M_{min}=0 and M_{max}=198,
which means that not more than 198 animals are infected within the whole population.
Copyright (c) Marvin  Institute for Computational Applications 2009, all rights reserved. 
contact 