R

What is the difference between McNemar’s test and the chi-squared test, and how do you know when to use each?

  • November 18, 2013

I have tried reading up on different sources, but I am still not clear what test would be the appropriate in my case. There are three different questions I am asking about my dataset:

  1. The subjects are tested for infections from X at different times. I want to know if the proportions of positive for X after is related to the proportion of positive for X before:
            After   
          |no  |yes|
Before|No  |1157|35 |
     |Yes |220 |13 |

results of chi-squared test: 
Chi^2 =  4.183     d.f. =  1     p =  0.04082 

results of McNemar's test: 
Chi^2 = 134.2 d.f. = 1 p = 4.901e-31

From my understanding, as the data are repeated measures, I must use McNemar’s test, which tests if the proportion of positive for X has changed.

But my questions seems to need the chi-squared test - testing if the proportion of positive for X after is related to the proportion of positive for X before.

I am not even sure if I understand the difference between McNemar’s test and the chi-squared correctly. What would be the right test if my question were, “Is the proportion of subjects infected with X after different from before?” 2. A similar case, but where instead of before and after, I measure two different infections at one point in time:

       Y   
     |no  |yes|
X|No  |1157|35 |
|Yes |220 |13 |

Which test would be right here if the question is “Does higher proportions of one infections relate to higher proportions of Y”? 3. If my question was, “Is infection Y at time t2 related to infection X at time t1?”, which test would be appropriate?

             Y at t2   
           |no  |yes|
X at t1|No  |1157|35 |
      |Yes |220 |13 |

I was using McNemar’s test in all these cases, but I have my doubts if that is the right test to answer my questions. I am using R. Could I use a binomial glm instead? Would that be analogous to the chi-squared test?

It is very unfortunate that McNemar’s test is so difficult for people to understand. I even notice that at the top of its Wikipedia page it states that the explanation on the page is difficult for people to understand. The typical short explanation for McNemar’s test is either that it is: ‘a within-subjects chi-squared test’, or that it is ‘a test of the marginal homogeneity of a contingency table’. I find neither of these to be very helpful. First, it is not clear what is meant by ‘within-subjects chi-squared’, because you are always measuring your subjects twice (once on each variable) and trying to determine the relationship between those variables. In addition, ‘marginal homogeneity’ is barely intelligible (I know what this means and I have a hard time moving from the words to the meaning). (Tragically, even this answer may be confusing. If it is, it may help to read my second attempt below.)

Let’s see if we can work through a process of reasoning about your top example to see if we can understand whether (and if so, why) McNemar’s test is appropriate. You have put:

enter image description here

This is a contingency table, so it connotes a chi-squared analysis. Moreover, you want to understand the relationship between and , and the chi-squared test checks for a relationship between the variables, so at first glance it seems like the chi-squared test must be the analysis that answers your question.

However, it is worth pointing out that we can also present these data like so:

enter image description here

When you look at the data this way, you might think you could do a regular old -test. But a -test isn’t quite right. There are two issues: First, because each row lists data measured from the same subject, we wouldn’t want to do a between-subjects -test, we would want to do a within-subjects -test. Second, since these data are distributed as a binomial, the variance is a function of the mean. This means that there is no additional uncertainty to worry about once the sample mean has been estimated (i.e., you don’t have to subsequently estimate the variance), so you don’t have to refer to the distribution, you can use the distribution. (For more on this, it may help to read my answer here: The -test vs. the test.) Thus, we would need a within-subjects -test. That is, we need a within-subjects test of equality of proportions.

We have seen that there are two different ways of thinking about and analyzing these data (prompted by two different ways of looking at the data). So we need to decide which way we should use. The chi-squared test assesses whether and are independent. That is, are people who were sick beforehand more likely to be sick afterwards than people who have never been sick. It is extremely difficult to see how that wouldn’t be the case given that these measurements are assessed on the same subjects. If you did get a non-significant result (as you almost do) that would simply be a type II error. Instead of whether and are independent, you almost certainly want to know if the treatment works (a question chi-squared does not answer). This is very similar to any number of treatment vs. control studies where you want to see if the means are equal, except that in this case your measurements are yes/no and they are within-subjects. Consider a more typical -test situation with blood pressure measured before and after some treatment. Those whose bp was above your sample average beforehand will almost certainly tend to be among the higher bps afterwards, but you don’t want to know about the consistency of the rankings, you want to know if the treatment led to a change in mean bp. Your situation here is directly analogous. Specifically, you want to run a within-subjects -test of equality of proportions. That is what McNemar’s test is.

So, having realized that we want to conduct McNemar’s test, how does it work? Running a between-subjects -test is easy, but how do we run a within-subjects version? The key to understanding how to do a within-subjects test of proportions is to examine the contingency table, which decomposes the proportions:

Obviously the proportions are the row totals divided by the overall total, and the proportions are the column totals divided by overall total. When we look at the contingency table we can see that those are, for example:

What is interesting to note here is that observations were yes both before and after. They end up as part of both proportions, but as a result of being in both calculations they add no distinct information about the change in the proportion of yeses. Moreover they are counted twice, which is invalid. Likewise, the overall total ends up in both calculations and adds no distinct information. By decomposing the proportions we are able to recognize that the only distinct information about the before and after proportions of yeses exists in the and , so those are the numbers we need to analyze. This was McNemar’s insight. In addition, he realized that under the null, this is a binomial test of against a null proportion of . (There is an equivalent formulation that is distributed as a chi-squared, which is what R outputs.)

There is another discussion of McNemar’s test, with extensions to contingency tables larger than 2x2, here.


Here is an R demo with your data:

mat = as.table(rbind(c(1157, 35), 
                    c( 220, 13) ))
colnames(mat) <- rownames(mat) <- c("No", "Yes")
names(dimnames(mat)) = c("Before", "After")
mat
margin.table(mat, 1)
margin.table(mat, 2)
sum(mat)

mcnemar.test(mat, correct=FALSE)
# McNemar's Chi-squared test
# 
# data: mat
# McNemar's chi-squared = 134.2157, df = 1, p-value < 2.2e-16
binom.test(c(220, 35), p=0.5)
# Exact binomial test
# 
# data: c(220, 35)
# number of successes = 220, number of trials = 255, p-value < 2.2e-16
# alternative hypothesis: true probability of success is not equal to 0.5
# 95 percent confidence interval:
# 0.8143138 0.9024996
# sample estimates:
# probability of success 
# 0.8627451 

If we didn’t take the within-subjects nature of your data into account, we would have a slightly less powerful test of the equality of proportions:

prop.test(rbind(margin.table(mat, 1), margin.table(mat, 2)), correct=FALSE)
# 2-sample test for equality of proportions without continuity
# correction
# 
# data: rbind(margin.table(mat, 1), margin.table(mat, 2))
# X-squared = 135.1195, df = 1, p-value < 2.2e-16
# alternative hypothesis: two.sided
# 95 percent confidence interval:
# 0.1084598 0.1511894
# sample estimates:
# prop 1 prop 2 
# 0.9663158 0.8364912 

That is, X-squared = 133.6627 instead of chi-squared = 134.2157. In this case, these differ very little, because you have a lot of data and only cases are overlapping as discussed above. (Another, and more important, problem here is that this counts your data twice, i.e., , instead of .)


Here are the answers to your concrete questions:

  1. The correct analysis is McNemar’s test (as discussed extensively above).
  2. This version is trickier, and the phrasing “does higher proportions of one infections relate to higher proportions of Y” is ambiguous. There are two possible questions:
  • It is perfectly reasonable to want to know if the patients who get one of the infections tend to get the other, in which case you would use the chi-squared test of independence. This question is asking whether susceptibility to the two different infections is independent (perhaps because they are contracted via different physiological pathways) or not (perhaps they are contracted due to a generally weakened immune system).
  • It is also perfectly reasonable to what to know if the same proportion of patients tend to get both infections, in which case you would use McNemar’s test. The question here is about whether the infections are equally virulent.
  1. Since this is once again the same infection, of course they will be related. I gather that this version is not before and after a treatment, but just at some later point in time. Thus, you are asking if the background infection rates are changing organically, which is again a perfectly reasonable question. At any rate, the correct analysis is McNemar’s test.

Edit: It would seem I misinterpreted your third question, perhaps due to a typo. I now interpret it as two different infections at two separate timepoints. Under this interpretation, the chi-squared test would be appropriate.

引用自:https://stats.stackexchange.com/questions/76875

comments powered by Disqus