Tabla de Contenidos
Confidence intervals (CI) are used in inferential statistics as a tool to estimate the value of a population parameter. These provide a greater amount of information about the true value of a parameter than do point estimators, since they represent an interval of values of finite width within which we have a certain degree of confidence that the true value of the parameter will lie. The latter is something that point estimators do not provide.
Confidence intervals for two populations
When we are interested in comparing two different populations, we are often interested in knowing if a certain parameter of one of them is greater than, less than, or equal to the corresponding parameter of the other. For example, when comparing the performance of two electric motors, we may be interested in determining whether or not the torque of motor A is greater than that of motor B. In this case, we are comparing two population means.
However, many times we are interested in comparing, not the mean values of a parameter, but the proportion of a population that meets or does not meet a certain condition. In this case, what is wanted is to establish a confidence interval to estimate the value of the difference between two population proportions.
Inferences about the difference of two population proportions P 1 -P 2
There are many different situations in which we may be interested in the difference between two population proportions. As we mentioned before, this difference allows us to compare equivalent proportions in two different populations. Some examples of research problems that require establishing a confidence interval for the difference between two population proportions are presented below:
- In clinical trials of a new medical treatment, it is of particular importance to compare the proportion of individuals who show an improvement in their medical condition in the population that received the treatment with the same proportion in the group of individuals that only received the placebo.
- When we want to compare the proportion of women and men who agree or disagree with a certain government measure.
- In business, we are often interested in comparing the quality of the manufacturing process in two different production lines. In this case, the proportions of defective or nonconforming items produced by both production lines in a given period of time can be compared.
- In the field of microbiology, we may be interested in comparing the proportion of bacterial colonies that survive after being treated with different chemical disinfectants.
- Marketers often do A/B tests to determine what content on a web page is most effective in converting prospects to buyers. To do this, half of the people who access the website are shown content (A) and the other half are shown alternative content (B) to then compare the proportions of visitors who actually bought the suggested product or service. .
From the comparison of P 1 and P 2 to the difference P 1 – P 2
There are many more examples of situations in which we may be interested in comparing the proportions of two different populations. This comparison can be made in different ways. For example, we may want to know if:
- Both proportions are equal (P 1 = P 2 )
- Proportion 1 is greater than proportion 2 (P 1 > P 2 )
- Proportion 1 is less than proportion 2 (P 1 < P 2 )
In any of these cases, these statements can be rewritten in terms of the difference between the proportions:
- If we are interested in finding out if P 1 = P 2 , this is equivalent to determining if P 1 – P 2 = 0
- If we are interested in finding out if P 1 > P 2 , this is equivalent to determining if P 1 – P 2 > 0
- If we are interested in finding out if P 1 < P 2 , this is equivalent to determining if P 1 – P 2 < 0
Therefore, any comparison between population proportions can be resolved by finding a confidence interval for the difference between population proportions and then carrying out an appropriate analysis of the result.
But how are these confidence intervals established?
This is achieved by analyzing samples from each population and using the tools of inferential statistics. This procedure depends on whether we are working with large or small samples.
Confidence Interval Estimation of the difference of two population proportions from large samples (n ≥ 30)
The confidence interval for the difference in population proportions can be solved for as an extension of the confidence interval for a binomial proportion in a population. In the case of binomial proportions (i.e., the outcome of the experiment or observation is a success or a failure and P represents the probability of success), the distribution of the proportion in a large sample ( p ) follows an approximately normal distribution with mean P (the population proportion) and variance P(1 – P)/n , as long as the probability of success is not too high or too low (i.e., not too close to 1 or 0, respectively) .
In the case of the difference between two population proportions, P 1 – P 2 , we can establish the limits of the confidence interval from two independent samples with proportions p 1 and p 2 . If these samples meet the same conditions as above (samples n 1 and n 2 large, and proportions p 1 and p 2 far from 1 and 0) and therefore follow normal distributions, the difference will also follow a normal distribution with mean P 1 – P 2 and variance p 1 (1 – p 1 )/n 1 + p 2(1 – p 2 )/n 2 .
Given these results, a confidence interval for the difference of two population proportions obtained from large samples, with a confidence level of 100(1 – α)%, where α represents the level of significance, is given by:
In the above formula, Z α/2 corresponds to the value of Z in the standard normal distribution that leaves an area of α/2 to its right.
Confidence interval for the difference of two population proportions from small samples (n < 30)
If either sample size is less than 30, or if either proportion is very close to 0 or 1, your distribution cannot approximate a normal distribution adequately. In this case, the difference of the two proportions will not follow a normal distribution either, which is why the above formula for the confidence interval does not apply.
The inference about the difference in population proportions based on small samples is considerably complex, and is beyond the scope of this article.
Interpretation of the confidence interval for the difference of two population proportions
After calculating the confidence interval for the difference of two population proportions, the result obtained must be interpreted. Three results can be given that are interpreted differently.
Let us consider any case in which a confidence interval is obtained with a confidence level of 100(1 – α)% or, simply, a significance level of α, whose lower and upper limits are LI and LS, respectively. That is to say:
Depending on the sign of the limits obtained, we can reach different conclusions regarding the difference between both population proportions:
- If both the lower and upper bounds are negative, then we can say, with a confidence level of 100(1 – α)%, that the proportion in population 2 is greater than the respective proportion in population 1. That is, we can say that P 1 < P 2 or that P 2 > P 1 .
- If the lower limit is negative and the upper limit is positive, and therefore the confidence interval contains zero, then we can say, with a confidence level of 100(1 – α)%, that there is no difference between the two. population proportions. That is, it is concluded that P 1 = P 2 .
- Finally, if both the lower and upper bounds are positive, then we can say, with a confidence level of 100(1 – α)%, that the population 1 proportion is greater than the respective population 2 proportion. That is, we conclude that P 1 > P 2 .
Example of calculating the confidence interval for two population proportions
statement
Suppose that a survey was carried out on a random sample of 250 Mexican engineering students to find out what proportion of them mastered the concept of confidence intervals. The results of the survey showed that 64.8% of them do not dominate it, while the rest do. On the other hand, the same survey was carried out on a sample of 180 Spanish engineering students, to which 54 students answered that they had mastered the concept of confidence intervals.
Is there a difference between the proportions of Spanish and Mexican students who master the concept of confidence intervals, at a significance level of 0.05?
Solution
As we can see from the question, what we want is to determine whether or not there is a difference between the proportions of two different populations. The proportion of interest consists of the proportion of students who do master the concept of confidence intervals, so that, in this case, responding affirmatively to the survey represents success from the point of view of the binomial experiment.
For the population of Mexican students, the sample was 250 students, and they indicate that the proportion of students who do not master the subject in question is 64.8%. But this is not the proportion we want, since not mastering the subject is a failure. Therefore, this proportion corresponds to the complement q . In view of this, the proportion of successes, p, for the sample of Mexican students is:
On the other hand, in the case of the sample of Spanish students, we have the number of successes and the total size of the sample, so the proportion of successes will be:
These results are summarized in the following table.
Mexican students | Spanish students |
n MEX = 250 | nESP = 180 |
p MEX = 0.352 | p ESP = 0.300 |
As we can see, both sample sizes are considerably larger than 30, so they are considered large samples. In addition, neither the proportion for Mexican students nor that of Spanish students is considerably close to 0 or 1. Finally, despite the fact that the statement does not specify it, we can assume that both samples are independent of each other.
Under these conditions, we can say that both the sample proportions of both populations and the difference in sample proportions will follow a normal distribution. Therefore, we can use the previous equation to determine the confidence interval, which will be:
Note that, to establish the confidence interval, we need the value of Z for half of the given significance level, which in this case is α = 0.05. That is, we must find Z α/2 = Z 0.05/2 = Z 0.025 . This value can be found in a standard normal distribution table, using a mobile statistics application or using a spreadsheet such as Excel for Windows or Numbers for MacOS.
In this case, Z 0.025 = 1.959964. So, the confidence interval will be:
As we can see, the confidence interval calculated in this way contains zero, which is why it is concluded, with a confidence level of 95%, that there is no significant difference between the proportions of Mexican and Spanish students who master the concept of intervals. trusted.
References
Cetinkaya-Rundel, M. (2012, March 13). Lecture 14: Large and small sample inference for proportions . Department of Statistical Science at Duke University. https://www2.stat.duke.edu/courses/Spring12/sta101.1/lec/lec14S.pdf
del Rio, AQ (2019, September 1). 7.8 Confidence interval for the difference in proportions. | Sweetened Basic Statistics . Book Down. https://bookdown.org/aquintela/EBE/confidence-interval-for-the-difference-of-proportions-.html
Holmes, A., Illowsky, B., & Dean, S. (2017, November 29). 10.4 Comparing Two Independent Population Proportions – Introductory Business Statistics . OpenStax. https://openstax.org/books/introductory-business-statistics/pages/10-4-comparing-two-independent-population-proportions
Icedo Félix, M. (2020, May 7). RPubs – Confidence intervals for the difference of two population proportions . RPubs. https://rpubs.com/Melanie_Icedo/Asignacion-6_Intervalo-confianza-proportion-poblacional
Statologists. (nd). Confidence interval for the difference of proportions . https://statologos.com/diferencia-de-intervalo-de-fianza-en-proportiones/