How to Construct a Confidence Interval of a Population Proportion

Artículo revisado y aprobado por nuestro equipo editorial, siguiendo los criterios de redacción y edición de YuBrain.

The confidence interval of a statistical parameter is the range of values ​​that it is estimated that this parameter can take; In other words, they are two values ​​between which this parameter can vary with a certain level of confidence. The calculation of the confidence interval is part of the determination of a statistical parameter of a population; the value of the parameter is determined on a sample of the population and, in the same calculation process, the confidence interval of the value of the parameter that has been obtained is determined. One type of parameter that can be estimated using inferential statistics is a proportion of a population.

For example, a question that can be asked is what is the percentage of the population of a country that supports a certain law. In this type of question, it is necessary to determine a confidence interval for the value that is determined. We will see below how the confidence interval of a proportion of a population is constructed exposing part of its theoretical basis.

As already mentioned, the confidence interval of a statistical parameter is defined as two values ​​between which this parameter can vary with a certain level of confidence; the parameter estimator is located in the center of this range. Thus, a confidence interval will have the form

estimator +/- uncertainty

Therefore, there will be two numbers that must be determined: the estimate of the parameter that we are studying and the uncertainty or margin of error.

Calculation premises

To carry out a statistical calculation it is necessary that certain premises defined for that specific determination are met. In the case of determining a confidence interval to evaluate a proportion of a population, the premises are the following.

1. A sample drawn at random from a population that is significantly large in size must be evaluated. The sample will have a number of cases n .

2. The members of the sample must be chosen independently of each other.

3. There must be at least 15 successes and 15 failures in the sample of size n .

Proportion of sample and population

Let’s look at the procedure for making an estimate of a proportion in a population. Just as a sample mean is used to estimate a population mean, a sample proportion can also be used to estimate a population proportion. The proportion of the population is the unknown parameter, it is the value to be determined. The way to calculate this parameter is by adding the successes registered in the sample and dividing the result of the sum by n , the total number of cases in the sample. we will call pto the parameter of the population to be studied, the proportion of the population that meets a certain criterion. In the same way we will have the proportion in the sample, which to differentiate it from the proportion of the population we will place a line above it as shown in the following formulas. The proportion in the sample is the estimator of the proportion in the population.

To determine the confidence interval of a proportion of a population, it is necessary to know what its statistical distribution is, as shown in the following figure.

Statistical distribution of the proportion of a population.
Statistical distribution of the proportion of a population.

With the statistical distribution it is possible to determine the estimator and the standard deviation SE , values ​​that constitute the confidence interval

confidence interval

with a confidence level

confidence level

In these statistical problems, the standard deviation SE has a binomial behavior as a function of the estimator of p , the proportion of positive cases in the sample of size n of the population, as shown by the following formula.

Standard deviation

The general definition uses the p- value in the formula for the standard deviation, which is an unknown value, so the standard error is used, substituting p for its estimator, as the previous formula shows.

Another aspect to consider is that under the three premises that were established, the binomial distribution can be approximated with the standard normal distribution.

In this way, the formula to determine the confidence interval of a proportion of a population is obtained.

Confidence interval of a proportion of a population.

The confidence level is determined as the percentage that is to be considered in the standard normal distribution, as shown in the previous figure; the larger the area, the higher the level of confidence to have in the confidence interval. The following table shows the values ​​of the parameter for the different values ​​of the confidence level, which express the distribution area to be covered.

Confidence level.

Example of Determining a Confidence Interval for a Population Proportion

Suppose we want to know with 95% confidence the percentage of the electorate in a city that identifies with a given political party. We collect the information in a simple random sample made up of 100 people in that city and we find that 64 of them identify with the political party.

First, we verify that the three premises we established are met. The opinion of the population of a city, a significantly large population, is evaluated and the sample is taken randomly. In this case n is equal to 100. The information for a given one of the 100 cases was collected independently. Both the positive responses to the consultation, that is, the successes, and the negative responses, that is, the failures, exceed 15 cases.

The value of the proportion of the sample, the estimator of the parameter that we want to determine, that is, the proportion of the population of the city that identifies with the political party in question, is determined as the quotient between the positive cases and the number of n cases that make up the sample; 64 divided by 100, 0.64. This is the value of the estimator and is the center of the confidence interval.

In the formula that evaluates the uncertainty there are two factors. The first factor is the confidence level that was determined to be 95%, for which the factor will be 1.96. To evaluate the second factor, the values ​​0.64 and 100 must be substituted in the formula, and it is obtained that the value of the second factor is 0.048. With the product of both factors the uncertainty is obtained; 0.094. So the confidence interval in this example is

0.640 +/- 0.094

This confidence interval can be interpreted as that with a confidence of 95%, that is, that the results represent 95% of the total population, the proportion of people in the city in question who identify with the political party will be between 54.6 % and 73.4 %.

Related statistical concepts

There are a number of ideas and statistical issues involved in determining this type of confidence interval. For example, we might perform a hypothesis test related to the value of the population proportion. We could also compare two proportions from two different populations.

Sources

Mood, Alexander; Graybill, Franklin A.; Boes, Duane C. Introduction to the Theory of Statistics . Third edition, McGraw-Hill, 1974.

Hypothesis test . Statistical inference. National Autonomous University of Mexico. Accessed October 2021.

Westfall, Peter H. Understanding Advanced Statistical Methods . Boca Raton, FL: CRC Press, 2013.

Sergio Ribeiro Guevara (Ph.D.)
Sergio Ribeiro Guevara (Ph.D.)
(Doctor en Ingeniería) - COLABORADOR. Divulgador científico. Ingeniero físico nuclear.

Artículos relacionados