Hypothesis Testing for Unbalanced Repeated Measures Design
Posted: Mon Jan 20, 2025 2:19 pm
Hello everyone,
I have some questions regarding my approach to hypothesis testing for an unbalanced repeated-measures design.
I've performed an analysis of variances (ANOVA) on a repeated-measures (RM) experiment with an unbalanced design, looking like this:
Participants of different User Levels perform a subjective rating of seven characteristics for a reference system and three variants. These three variants are distinguished by two IVs (RF and MC), i.e. the test matrix looks like this:
Reference: Very different methodically, not related to the IVs at all
V1: [RF on, MC off]
V2: [RF off, MC off]
V3: [RF on, MC on]
All participants rated all seven characteristics (DVs) for all three variants. Hence, the imbalance is due to the missing test matrix cells, i.e. the N evaluations for the condition [RF off, MC on], which could not be tested.
The goal of my analysis is the investigation of the main effects of the IVs on user ratings and their (IVs) interaction with the User Level.
Analysis via jamovi's RM ANOVA works fine, as do post-hoc analyses and interactions, if I am using the variant (0,1,2, or 3) as a Repeated Measures Factor, and User Level as a Between Subjects Factor. However, now that I would like to investigate the main effects, I've been having the following problem: RM ANOVA can to my knowledge not handle empty repeated measures cells, so defining my IVs as RM Factors is not an option.
According to my research, Linear Mixed Models (LMMs) should be able to handle such inbalanced designs better. I've therefore plugged the same data into gamlj's Mixed Model in the following way:
Linear mixed model fit by REML
6 ~ 1 + MC + RF + UserLevel + MC:RF+( 1 | UserID )
To my understanding, this means that the info on the RM nature of this design is captured via the Cluster Variable UserID, and I am allowing random effects (i.e. individual differences between users) on the model intercept through UserID. Unfortunately, this modeling approach doesn't seem successful, since many of the output cells remain empty, the plot is not fully populated (see attachments), and I am receiving the following warning:
WARNING: Some of the coefficients cannot be estimated because they are perfectly correlated with other coefficients in the model. This can be due to empty cells in the design or perfectly correlated covariates. The results may be uninterpretable.
My questions are:
- Do you see any general problems with my approach to this analysis? Am I on the wrong path with an LMM?
- Is there a smarter or more well suited analysis for the described setup?
- If my approach is not generally wrong, is there a way to check what's going wrong under the hood of my LMM?
I have read the github examples and I couldn't find a similarly unbalanced design.
Thanks in advance for your ideas,
Max
p.s.:
1. The reference system is not relevant for my hypothesis testing, I've therefore filtered it out using IF(`Vehicle`!='0')
2. To capture main effects, I have also performed Paired Samples t-tests on the lumped data, e.g. comparing the averaged responses from V1 and V2 with V3 to capture the effect of MC. I am however unsure whether this is a good approach, plus I can't investigate interaction effects this way.
I have some questions regarding my approach to hypothesis testing for an unbalanced repeated-measures design.
I've performed an analysis of variances (ANOVA) on a repeated-measures (RM) experiment with an unbalanced design, looking like this:
Participants of different User Levels perform a subjective rating of seven characteristics for a reference system and three variants. These three variants are distinguished by two IVs (RF and MC), i.e. the test matrix looks like this:
Reference: Very different methodically, not related to the IVs at all
V1: [RF on, MC off]
V2: [RF off, MC off]
V3: [RF on, MC on]
All participants rated all seven characteristics (DVs) for all three variants. Hence, the imbalance is due to the missing test matrix cells, i.e. the N evaluations for the condition [RF off, MC on], which could not be tested.
The goal of my analysis is the investigation of the main effects of the IVs on user ratings and their (IVs) interaction with the User Level.
Analysis via jamovi's RM ANOVA works fine, as do post-hoc analyses and interactions, if I am using the variant (0,1,2, or 3) as a Repeated Measures Factor, and User Level as a Between Subjects Factor. However, now that I would like to investigate the main effects, I've been having the following problem: RM ANOVA can to my knowledge not handle empty repeated measures cells, so defining my IVs as RM Factors is not an option.
According to my research, Linear Mixed Models (LMMs) should be able to handle such inbalanced designs better. I've therefore plugged the same data into gamlj's Mixed Model in the following way:
Linear mixed model fit by REML
6 ~ 1 + MC + RF + UserLevel + MC:RF+( 1 | UserID )
To my understanding, this means that the info on the RM nature of this design is captured via the Cluster Variable UserID, and I am allowing random effects (i.e. individual differences between users) on the model intercept through UserID. Unfortunately, this modeling approach doesn't seem successful, since many of the output cells remain empty, the plot is not fully populated (see attachments), and I am receiving the following warning:
WARNING: Some of the coefficients cannot be estimated because they are perfectly correlated with other coefficients in the model. This can be due to empty cells in the design or perfectly correlated covariates. The results may be uninterpretable.
My questions are:
- Do you see any general problems with my approach to this analysis? Am I on the wrong path with an LMM?
- Is there a smarter or more well suited analysis for the described setup?
- If my approach is not generally wrong, is there a way to check what's going wrong under the hood of my LMM?
I have read the github examples and I couldn't find a similarly unbalanced design.
Thanks in advance for your ideas,
Max
p.s.:
1. The reference system is not relevant for my hypothesis testing, I've therefore filtered it out using IF(`Vehicle`!='0')
2. To capture main effects, I have also performed Paired Samples t-tests on the lumped data, e.g. comparing the averaged responses from V1 and V2 with V3 to capture the effect of MC. I am however unsure whether this is a good approach, plus I can't investigate interaction effects this way.