Hello everyone,
I have some questions regarding my approach to hypothesis testing for an unbalanced repeated-measures design.
I've performed an analysis of variances (ANOVA) on a repeated-measures (RM) experiment with an unbalanced design, looking like this:
Participants of different User Levels perform a subjective rating of seven characteristics for a reference system and three variants. These three variants are distinguished by two IVs (RF and MC), i.e. the test matrix looks like this:
Reference: Very different methodically, not related to the IVs at all
V1: [RF on, MC off]
V2: [RF off, MC off]
V3: [RF on, MC on]
All participants rated all seven characteristics (DVs) for all three variants. Hence, the imbalance is due to the missing test matrix cells, i.e. the N evaluations for the condition [RF off, MC on], which could not be tested.
The goal of my analysis is the investigation of the main effects of the IVs on user ratings and their (IVs) interaction with the User Level.
Analysis via jamovi's RM ANOVA works fine, as do post-hoc analyses and interactions, if I am using the variant (0,1,2, or 3) as a Repeated Measures Factor, and User Level as a Between Subjects Factor. However, now that I would like to investigate the main effects, I've been having the following problem: RM ANOVA can to my knowledge not handle empty repeated measures cells, so defining my IVs as RM Factors is not an option.
According to my research, Linear Mixed Models (LMMs) should be able to handle such inbalanced designs better. I've therefore plugged the same data into gamlj's Mixed Model in the following way:
Linear mixed model fit by REML
6 ~ 1 + MC + RF + UserLevel + MC:RF+( 1 | UserID )
To my understanding, this means that the info on the RM nature of this design is captured via the Cluster Variable UserID, and I am allowing random effects (i.e. individual differences between users) on the model intercept through UserID. Unfortunately, this modeling approach doesn't seem successful, since many of the output cells remain empty, the plot is not fully populated (see attachments), and I am receiving the following warning:
WARNING: Some of the coefficients cannot be estimated because they are perfectly correlated with other coefficients in the model. This can be due to empty cells in the design or perfectly correlated covariates. The results may be uninterpretable.
My questions are:
- Do you see any general problems with my approach to this analysis? Am I on the wrong path with an LMM?
- Is there a smarter or more well suited analysis for the described setup?
- If my approach is not generally wrong, is there a way to check what's going wrong under the hood of my LMM?
I have read the github examples and I couldn't find a similarly unbalanced design.
Thanks in advance for your ideas,
Max
p.s.:
1. The reference system is not relevant for my hypothesis testing, I've therefore filtered it out using IF(`Vehicle`!='0')
2. To capture main effects, I have also performed Paired Samples t-tests on the lumped data, e.g. comparing the averaged responses from V1 and V2 with V3 to capture the effect of MC. I am however unsure whether this is a good approach, plus I can't investigate interaction effects this way.
Hypothesis Testing for Unbalanced Repeated Measures Design
Hypothesis Testing for Unbalanced Repeated Measures Design
- Attachments
-
- LMM_In.png (119.92 KiB) Viewed 4687 times
-
- LMM_Out.png (106.75 KiB) Viewed 4687 times
Re: Hypothesis Testing for Unbalanced Repeated Measures Design
Hi Maximus.
When I hear the term "unbalanced" I think "unequal numbers of observation." However, your design is what I would call, not "unbalanced," but "incomplete." Had at least a few participants participants experienced the "the RF off, MC on" condition then the LMM would likely work. But with zero "RF off, MC on" observations there's no way to conceptualize or compute an RF by MC interaction (thus the LMM doesn't provide results or such an interaction). I my opinion, because of the incompleteness of your design, you need to re-formulate your goals so that they don't pertain at all to main effects (of RF and MC) or to an RF-by-MC interaction.
Instead of the two factors, RF and MC, you need to re-code the data so that you have have just one factor, "RF MC Condition," with the three levels of that factor being:
"Level 1: [RF on, MC off]"
"Level 2: [RF off, MC off]"
"Level 3: [RF on, MC on]"
In this way (if I understand your data correctly) you can assess whether there is a significant effect of "RF MC Condition," and you can use multiple comparisons to test the conditions against each other. All of this should work just as well as a repeated-measures ANOVA (wide data format) or as an LMM (long data format).
When I hear the term "unbalanced" I think "unequal numbers of observation." However, your design is what I would call, not "unbalanced," but "incomplete." Had at least a few participants participants experienced the "the RF off, MC on" condition then the LMM would likely work. But with zero "RF off, MC on" observations there's no way to conceptualize or compute an RF by MC interaction (thus the LMM doesn't provide results or such an interaction). I my opinion, because of the incompleteness of your design, you need to re-formulate your goals so that they don't pertain at all to main effects (of RF and MC) or to an RF-by-MC interaction.
Instead of the two factors, RF and MC, you need to re-code the data so that you have have just one factor, "RF MC Condition," with the three levels of that factor being:
"Level 1: [RF on, MC off]"
"Level 2: [RF off, MC off]"
"Level 3: [RF on, MC on]"
In this way (if I understand your data correctly) you can assess whether there is a significant effect of "RF MC Condition," and you can use multiple comparisons to test the conditions against each other. All of this should work just as well as a repeated-measures ANOVA (wide data format) or as an LMM (long data format).
Re: Hypothesis Testing for Unbalanced Repeated Measures Design
Hi reason180,
thank you for your input. I was not sure how to describe the missing cell in my design, "incomplete" does make sense.
The solution you're suggesting is the same as my initial approach and works well for both RM ANOVA and LMM. Do I read correctly from your answer that, there is no way I can make a reliable statement about the specific effects of the modifications (RF and MC) that goes beyond the pairwise comparisons due to the incompleteness? Effect directions are very clear after pairwise comparisons, but it seemed to me like I was not utilizing the repeated measures, e.g. the 2*N evaluations for the condition MC off.
I take it that the comparison of effect sizes of the modifications will not be feasible from this analysis either. I appreciate that this might be a conceptual limitation due to my incomplete test matrix. If so, certainly a lesson learned for me.
Are there thoughts on my suggestion of "lumping" the means of the 2*N evaluations into N means of means to compare them via a dependent samples t-test? Is this a statistical test or am I just making this method up? I was unable to find information on this but it might be due to my incorrect or incomplete use of statistical terminology.
Thanks again for your thoughts on this
Max
thank you for your input. I was not sure how to describe the missing cell in my design, "incomplete" does make sense.
The solution you're suggesting is the same as my initial approach and works well for both RM ANOVA and LMM. Do I read correctly from your answer that, there is no way I can make a reliable statement about the specific effects of the modifications (RF and MC) that goes beyond the pairwise comparisons due to the incompleteness? Effect directions are very clear after pairwise comparisons, but it seemed to me like I was not utilizing the repeated measures, e.g. the 2*N evaluations for the condition MC off.
I take it that the comparison of effect sizes of the modifications will not be feasible from this analysis either. I appreciate that this might be a conceptual limitation due to my incomplete test matrix. If so, certainly a lesson learned for me.
Are there thoughts on my suggestion of "lumping" the means of the 2*N evaluations into N means of means to compare them via a dependent samples t-test? Is this a statistical test or am I just making this method up? I was unable to find information on this but it might be due to my incorrect or incomplete use of statistical terminology.
Thanks again for your thoughts on this
Max
Re: Hypothesis Testing for Unbalanced Repeated Measures Design
RE "Do I read correctly from your answer that, there is no way I can make a reliable statement about the specific effects of the modifications (RF and MC) that goes beyond the pairwise comparisons due to the incompleteness? [. . .] I take it that the comparison of effect sizes of the modifications will not be feasible from this analysis either. [. . .] Are there thoughts on my suggestion of "lumping" the means of the 2*N evaluations into N means of means to compare them via a dependent samples t-test?"
Given that all of your conditions vary within-subjects (i.e., they are repeated-measures), I think a simple way to accomplish what you want is to calculate the measure of interest for each participant separately, and then do a one-sample t test to assess the results.
Examples:
To assess whether the mean response is significantly different for the mean of Level 1 and Level 2, versus Level 3:
For each participant separately, compute a statistic--call it H--equal to the mean of the Level 1 and the Level 2 response, minus the Level 3 response. Conduct a one-sample t test to assess whether the mean H is significantly different from 0.0.
To assess whether the Level-1-versus-Level-2 difference is significantly different from the Level-1-versus-Level-3 difference:
Compute for each participant: H = ( (Level 1 minus Level 2) minus (Level 3 minus Level 2) ). Conduct a one-sample t test to assess whether the mean H is significantly different from 0.0.
Given that all of your conditions vary within-subjects (i.e., they are repeated-measures), I think a simple way to accomplish what you want is to calculate the measure of interest for each participant separately, and then do a one-sample t test to assess the results.
Examples:
To assess whether the mean response is significantly different for the mean of Level 1 and Level 2, versus Level 3:
For each participant separately, compute a statistic--call it H--equal to the mean of the Level 1 and the Level 2 response, minus the Level 3 response. Conduct a one-sample t test to assess whether the mean H is significantly different from 0.0.
To assess whether the Level-1-versus-Level-2 difference is significantly different from the Level-1-versus-Level-3 difference:
Compute for each participant: H = ( (Level 1 minus Level 2) minus (Level 3 minus Level 2) ). Conduct a one-sample t test to assess whether the mean H is significantly different from 0.0.
Re: Hypothesis Testing for Unbalanced Repeated Measures Design
Apologies for the late reply, I couldn't check this yesterday. I implemented your suggested approach in my long-format dataset today.
While this might not be surprising to someone with a better understanding of hypothesis testing than me, I was surprised to find that the results were exactly identical to my original approach (see attached image).
After dissecting what's happening, or rather where the difference between the two approaches lies in the equations below, I understand that now. Since your approach of generating a measure of interest and hypothesis testing via a one-sample t-test sounds perfectly reasonable to me, I am now more confident that this is a sensible approach. So in short, I suppose I learned the difference between a paired-samples t-test and a one-sample t-test today. Thanks for your help!
My original approach of "lumped" means of paired evaluations as inputs to the paired samples t-test.
DV6 (RF on) = MEAN(`DV6_V1`,`DV6_V3`)
DV6 (MC off) = MEAN(`DV6_V1`,`DV6_V2`)
reason180's approach of generating the measure of interest from the results from all three variants
H (RF means) = MEAN(`DV6_V1`,`DV6_V3`)-`DV6_V2`
H (MC means) = MEAN(`DV6_V1`,`DV6_V2`)-`DV6_V3`
While this might not be surprising to someone with a better understanding of hypothesis testing than me, I was surprised to find that the results were exactly identical to my original approach (see attached image).
After dissecting what's happening, or rather where the difference between the two approaches lies in the equations below, I understand that now. Since your approach of generating a measure of interest and hypothesis testing via a one-sample t-test sounds perfectly reasonable to me, I am now more confident that this is a sensible approach. So in short, I suppose I learned the difference between a paired-samples t-test and a one-sample t-test today. Thanks for your help!
My original approach of "lumped" means of paired evaluations as inputs to the paired samples t-test.
DV6 (RF on) = MEAN(`DV6_V1`,`DV6_V3`)
DV6 (MC off) = MEAN(`DV6_V1`,`DV6_V2`)
reason180's approach of generating the measure of interest from the results from all three variants
H (RF means) = MEAN(`DV6_V1`,`DV6_V3`)-`DV6_V2`
H (MC means) = MEAN(`DV6_V1`,`DV6_V2`)-`DV6_V3`
- Attachments
-
- tTestsOut.png (18.48 KiB) Viewed 4444 times
Re: Hypothesis Testing for Unbalanced Repeated Measures Design
Yes. The underlying computation for doing a paired sample t test involves first computing difference scores and then doing a one-sample t test.