Hi there, I'm looking for the appropriate statistical test and I couldn't find any similar topics discussed previously, apologies if I missed it or if I am posting in the wrong section, I'm new to this
In my dataset, I have data collected from patients regarding their pulmonary function tests (PFTs). For instance one of them is forced vital capacity (FVC) in liters recorded for each patient at baseline and 1 year later. From this data, I created a variable with FVC variation by substracting the latter to the first.
One of my variables is a grouping variable which splits my patients into two groups, familial patients and sporadic patients. In case it matters somehow, the number of patients is different in each group (something like 95 and 110).
All normality tests are significant which is why I used non-parametric tests.
My goal was to look for differences in FVC variation among the two groups (sporadic and familial).
While I initially ran a Mann-Whitney U test, I then realised FVC variation comes from repeated values in the same patient, so I thought maybe a paired test was best suited (Wilcoxon Signed-rank test), however, I can't find the way to add a grouping variable to compare familial patients to sporadic patients.
Am I wrong somewhere? Should I be using a Mann-Whitney U since the groups are independent, or a Wilcoxon signed-rank test? If so can I and how can I add a grouping variable to this test?
I hope I made myself clear and thanks ahead for the help.
Wilcoxon signed-rank, Mann-whitney U, else?
Re: Wilcoxon signed-rank, Mann-whitney U, else?
Hey @diegoD,
A quick suggestion would be to use ANCOVA with your variables.
When groups differ in baseline, ANCOVA can be used to control for these differences.
The usual way to do this ANCOVA is to use the posttest score as the dependent variable and the pretest score as a covariate (Group as a fixed factor). By removing the variance explained by the pretest from the posttest, the residual is variation that reflects the change from the pretest.
When groups are assigned at random, ANCOVA is an excellent method for comparing changes between groups.
However, when groups are naturally occurring, the baseline differences are not due to chance, and ANCOVA will yield biased conclusions.
Cheers,
Maurizio
A quick suggestion would be to use ANCOVA with your variables.
When groups differ in baseline, ANCOVA can be used to control for these differences.
The usual way to do this ANCOVA is to use the posttest score as the dependent variable and the pretest score as a covariate (Group as a fixed factor). By removing the variance explained by the pretest from the posttest, the residual is variation that reflects the change from the pretest.
When groups are assigned at random, ANCOVA is an excellent method for comparing changes between groups.
However, when groups are naturally occurring, the baseline differences are not due to chance, and ANCOVA will yield biased conclusions.
Cheers,
Maurizio
Re: Wilcoxon signed-rank, Mann-whitney U, else?
Hey Maurizio,
Thank you so much for the suggestion! If I understand you correctly, you would try the following:
Several things, however:
1) In this case, groups do not differ significantly in baseline FVC.
2) Groups are not assigned randomly; in this case, they are naturally occurring. You either have other cases in your family or you don't, and there is no intervention.
3) I get an error because some data is missing, and I don't have exactly the same number of data for everyone.
Therefore, I'm not entirely sure ANCOVA fits my variables. Hopefully, this will provide more clarity:
I'm trying to get something similar to this:
However, since FVC variation comes from two repeated measurements in the same patients, I'm not sure an independent test is adequate.
Hope this provides more clarification, and thanks again for your help.
Diego
Thank you so much for the suggestion! If I understand you correctly, you would try the following:
Several things, however:
1) In this case, groups do not differ significantly in baseline FVC.
2) Groups are not assigned randomly; in this case, they are naturally occurring. You either have other cases in your family or you don't, and there is no intervention.
3) I get an error because some data is missing, and I don't have exactly the same number of data for everyone.
Therefore, I'm not entirely sure ANCOVA fits my variables. Hopefully, this will provide more clarity:
I'm trying to get something similar to this:
However, since FVC variation comes from two repeated measurements in the same patients, I'm not sure an independent test is adequate.
Hope this provides more clarification, and thanks again for your help.
Diego
Re: Wilcoxon signed-rank, Mann-whitney U, else?
@diegoD
Hi. It would be helpful if you could create a small, fake dataset that is similar to your real one (e.g., with some missing values), and post an omv (jamovi) file that includes the fake data.
Hi. It would be helpful if you could create a small, fake dataset that is similar to your real one (e.g., with some missing values), and post an omv (jamovi) file that includes the fake data.
Re: Wilcoxon signed-rank, Mann-whitney U, else?
Hi MAgojam
I'm aware of different ways of controlling for baseline, including repeated measures ANOVA, ANCOVA, and simple subtraction (see screen shot below). I'm curious to know why you think ANCOVA is the best method.
I'm aware of different ways of controlling for baseline, including repeated measures ANOVA, ANCOVA, and simple subtraction (see screen shot below). I'm curious to know why you think ANCOVA is the best method.
Re: Wilcoxon signed-rank, Mann-whitney U, else?
Hi @reason180,
I always read with great interest your posts/contributions in this forum, and this too is not lacking.
My somewhat quick answer to @Diego in which I pushed towards ANCOVA, rather than RM ANOVA or the possibility of linear mixed models has its motivation, to be able to find an answer also using jamovi cloud (currently MODULES are only available for jamovi Desktop), but because its study design (although not explicitly stated) seemed to me more oriented towards an ANCOVA.
I will say things that you surely know, but only for the convenience of exposition.
We can say that one of the underlying assumptions of RM ANOVA is that factor levels are randomized within subjects.
Here the factor levels are pre-measure and post-measure, which is unidirectional.
Thus factor levels were not randomized with subjects. Therefore, RM ANOVA would have no indications.
RM ANOVA would only be appropriate if the outcome was measured multiple times after the intervention.
reference: Pat Dugard & John Todman (1995) Analysis of Pre-test-Post-test Control Group Designs in Educational Research, Educational Psychology, 15:2, 181-198, DOI:10.1080/0144341950150207.
In ANCOVA, the dependent variable is the post-measure, the pre-measure is not an outcome, but a covariate.
This model evaluates differences in post means after accounting for pre values.
The two analyzes answer several research questions.
Now, if the question is whether the mean change in outcome from pre-measure to post-measure differed in the two groups.
This is directly measured by the time*group interaction term in the RM ANOVA.
ANCOVA answers a different research question, which is whether post-masure means, adjusted for pre-measure values, differs between the two groups, because the focus is on whether one group has a higher post-measure mean.
So, use of ANCOVA would be indicated when the research question is about the mean value at the end. Not about gains, growth, or changes.
Adjusting for the pre-measure values in ANCOVA has at least two advantages.
Subject-specific variation is removed from both approaches which work well for specific situations.
The important thing is not to combine them together so as not to remove the subject-specific variation twice.
Cheers,
Maurizio
I always read with great interest your posts/contributions in this forum, and this too is not lacking.
My somewhat quick answer to @Diego in which I pushed towards ANCOVA, rather than RM ANOVA or the possibility of linear mixed models has its motivation, to be able to find an answer also using jamovi cloud (currently MODULES are only available for jamovi Desktop), but because its study design (although not explicitly stated) seemed to me more oriented towards an ANCOVA.
I will say things that you surely know, but only for the convenience of exposition.
We can say that one of the underlying assumptions of RM ANOVA is that factor levels are randomized within subjects.
Here the factor levels are pre-measure and post-measure, which is unidirectional.
Thus factor levels were not randomized with subjects. Therefore, RM ANOVA would have no indications.
RM ANOVA would only be appropriate if the outcome was measured multiple times after the intervention.
reference: Pat Dugard & John Todman (1995) Analysis of Pre-test-Post-test Control Group Designs in Educational Research, Educational Psychology, 15:2, 181-198, DOI:10.1080/0144341950150207.
In ANCOVA, the dependent variable is the post-measure, the pre-measure is not an outcome, but a covariate.
This model evaluates differences in post means after accounting for pre values.
The two analyzes answer several research questions.
Now, if the question is whether the mean change in outcome from pre-measure to post-measure differed in the two groups.
This is directly measured by the time*group interaction term in the RM ANOVA.
ANCOVA answers a different research question, which is whether post-masure means, adjusted for pre-measure values, differs between the two groups, because the focus is on whether one group has a higher post-measure mean.
So, use of ANCOVA would be indicated when the research question is about the mean value at the end. Not about gains, growth, or changes.
Adjusting for the pre-measure values in ANCOVA has at least two advantages.
- Ensure that any post-measure differences truly result from (e.g. treatment) and are not a residual effect of pre-measure (usually random) differences between groups.
- Account for the variation around the post-measure means that results from the variation in where patients started at the pre-measure.
Subject-specific variation is removed from both approaches which work well for specific situations.
The important thing is not to combine them together so as not to remove the subject-specific variation twice.
Cheers,
Maurizio
Re: Wilcoxon signed-rank, Mann-whitney U, else?
Here, I created a fake dataset similar to the real one. Thanks for your help.
@MAgojam feel free to take a look, I tried setting up the ANCOVA as you suggested, I'm not sure I've done it properly. Plus it doesn't seem to meet the normality assumptions. Meanwhile I'll take a look at the readings you provided, so thanks a lot. Again in this case groups are naturally occurring, are the results biased then?
If you need more information on the context I'll be more than happy to provide it.
Cheers,
Diego
Re: Wilcoxon signed-rank, Mann-whitney U, else?
Thanks Maurizio. I will take a look at the article.
Re: Wilcoxon signed-rank, Mann-whitney U, else?
@diegoD
Hi. Given that you are concerned about assumption violation and would like to take a non-parametric approach, the Mann-Whitney U on difference scores seems reasonable to me. Signed rank will not work because it cannot accommodate additional or group factors. As you've seen, ANCOVA doesn't address the assumption-violation your concerned about, and neither does a repeated-measures ANOVA (see my new attachment). I did a square root transformation (attached). That did not help on the fake data. You might try Winsorizing the data in each group to see if that fixes at least the unequal variance problem: E.g., within each group separately change the highest 10% of values so that they equal the highest unchanged value, and change the lowest 10% of values so that they equal the lowest unchanged value.
Original: 2, 4, 7, 8, 9, 13, 20, 21, 23, 35
Winsorized: 4, 4, 7, 8, 9, 13, 20, 21, 23, 23
If that works then you could conduct an ANCOVA or repeated-measures analysis on the Winsorized data.
Hi. Given that you are concerned about assumption violation and would like to take a non-parametric approach, the Mann-Whitney U on difference scores seems reasonable to me. Signed rank will not work because it cannot accommodate additional or group factors. As you've seen, ANCOVA doesn't address the assumption-violation your concerned about, and neither does a repeated-measures ANOVA (see my new attachment). I did a square root transformation (attached). That did not help on the fake data. You might try Winsorizing the data in each group to see if that fixes at least the unequal variance problem: E.g., within each group separately change the highest 10% of values so that they equal the highest unchanged value, and change the lowest 10% of values so that they equal the lowest unchanged value.
Original: 2, 4, 7, 8, 9, 13, 20, 21, 23, 35
Winsorized: 4, 4, 7, 8, 9, 13, 20, 21, 23, 23
If that works then you could conduct an ANCOVA or repeated-measures analysis on the Winsorized data.
- Attachments
-
- Fake Dataset plus.omv
- (196.49 KiB) Downloaded 297 times
Re: Wilcoxon signed-rank, Mann-whitney U, else?
Hi Diego,
@reason180's suggestions for combining pre/post winsorization with a classic ANCOVA is a step I'd try.
Right now I'm completing an update for ANOVA Robusta in the Walrus module in the jamovi library.
I also intend to add the option for an ANCOVA Robusta (Rand Wilcox [aut]), which might be just what you need.
I am attaching a screenshot where you can see an ANCOVA Robust, both trim and boostrap, of the 'WRS2' package engaged on your fake data.
Of course this is done by the Rj module editor with system R, because 'WRS2' is not in the jamovi R library.
Cheers,
Maurizio
@reason180's suggestions for combining pre/post winsorization with a classic ANCOVA is a step I'd try.
Right now I'm completing an update for ANOVA Robusta in the Walrus module in the jamovi library.
I also intend to add the option for an ANCOVA Robusta (Rand Wilcox [aut]), which might be just what you need.
I am attaching a screenshot where you can see an ANCOVA Robust, both trim and boostrap, of the 'WRS2' package engaged on your fake data.
Of course this is done by the Rj module editor with system R, because 'WRS2' is not in the jamovi R library.
Cheers,
Maurizio