optimization parameters in gamlj module

peterpc5j · Post by **peterpc5j** » Tue Apr 28, 2020 9:16 pm

Dear all,

I am considering to move some analyses in R to jamovi - particularly those related to the family of mixed models. First, I would like to know why I get different results, comparing R and jamovi (gamlj module), for linear mixed model analyses which I thought would be equivalent. In fact, looking at deviance (is that what in gamlj calls "LogLikel."), the result I get in jamovi is better.
I thought gamlj was kind of a wrapper around lme4 or lmerTest for most of the funcionality. When I run the analysis in syntax mode, and use the generated model in a call to gamlj::gamljMixed in R, I get the result as shown when I do this in jamovi. However, when I take the model formulation and feed it to lmer, setting the options to REML = TRUE and using the bobyqa optimizer, the fixed and random effects estimates, fitness measures and p-values are different, although I get the same number of parameters in the output (so I suppose the number of degrees of freedom is identical and the models to lmer and gamlj are in effect the same). They are not dramatically different but they go beyond rounding.
To give you an idea:
gamlj fit and some estimates
AIC: 123775.8378
BIC: 123845.2377
LogLikel.: 123757.5462
Fixed Effects Parameter Estimates
─────────────────────────────────────────────────────────────────────────────────────────
Names Estimate SE Lower Upper df t p
─────────────────────────────────────────────────────────────────────────────────────────
(Intercept) 2.8923 0.571 1.7739 4.011 19.1 5.0688 < .001
Random Components
─────────────────────────────────────────────────────────────
Groups Name SD Variance ICC
─────────────────────────────────────────────────────────────
Subject ...
(Intercept) 2.513 6.317 0.0711

Same through lmer:

REML criterion at convergence: 123762.7
AIC: 123780.7
BIC: 123850.4

Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 3.43159 0.63977 19.10459 5.364 3.49e-05 ***
Random effects:
Groups Name Variance Std.Dev.
Subject ...
Subject.3 (Intercept) 7.9726 2.8236

I thought that the difference might be due to convergence criteria, but I can't find out whether gamlj uses any other values than the bobyqa defaults. There is simply no documentation on that, and I couldn't immediately localize it in the source code.

So what might be going on?

Thanks to all,

Peter

mcfanda@gmail.com · Post by **mcfanda@gmail.com** » Fri May 01, 2020 8:21 am

Hi
I've tried an intercept-only model like yours with several datasets and I'm not able to reproduce the issue. Would you share an example of data in which the problem occurs?
thanks

peterpc5j · Post by **peterpc5j** » Tue May 05, 2020 3:01 pm

I'm sorry, my message maybe gave the impression that I'm only fitting a random intercept. I only copy-pasted enough of the output to show that the estimates are different. In fact all effects are included as random factors, not only the intercept. I simulated data from the fitted model and attached it in zipped csv format. Below is a code that can be run in R. I included ML estimate besides REML (because I'm still not sure what the 'log-likelihood' column represents in REML). Both lme4 and gamlj provide estimates on 9 parameters. Although the difference is not as dramatically different as with my original data, gamlj is doing better than lmer again. The log-likelihood for lmer is -61893.01, while for gamlj it is -61891.6570. For what it's worth to infer what happens inside the optimizers, the lmer solution takes about 3 seconds on my computer, while gamlj needs about 14 seconds.

Code: Select all

# Check dependencies
require('lme4')
require('lmerTest')
require('gamlj')

# Read in the data
dataset <- read.csv(unz('dataset.csv.zip','dataset.csv'))

# Formula copy-pasted from jamovi/GAMLj
formula_str <- "Y ~ 1 + X_num + X_cat + X_num:X_cat + (0+X_num|S) + (0+X_cat|S) + (0+X_num:X_cat|S) + (1|S)"

# REML solutions with lmer, default and bobyqa optimizations, and gamljMixed
lmer_res_default <- lmer(formula = formula_str,  data=dataset)
lmer_res_bobyqa <- lmer(formula = formula_str,  data=dataset, control=lmerControl(optimizer='bobyqa'))
gamlj_res <- gamljMixed(formula = as.formula(formula_str),  data=dataset, correlatedEffects = "nocorr")

# ML solutions with lmer, default and bobyqa optimizations, and gamljMixed
lmer_res_default_ML <- lmer(formula = formula_str,  data=dataset, REML=FALSE)
lmer_res_bobyqa_ML <- lmer(formula = formula_str,  data=dataset, control=lmerControl(optimizer='bobyqa'), REML=FALSE)
gamlj_res_ML <- gamljMixed(formula = as.formula(formula_str),  data=dataset, correlatedEffects = "nocorr", reml=FALSE)

# Display output for comparison
print(summary(lmer_res_bobyqa_ML)$AICtab)
print(gamlj_res_ML$info)

mcfanda@gmail.com · Post by **mcfanda@gmail.com** » Wed May 06, 2020 8:44 am

Hi,
the difference is due to the scaling of the variables. gamlj default is to center the variables to their means (for several reasons, among which there is easier convergence and easier interpretation of the results). If you change the gamlj scaling method to "none" (so no scaling is done), you get exactly the same results in lmer.

Try this

Code: Select all

gamlj_res <- gamljMixed(formula = as.formula(formula_str),  data=dataset, correlatedEffects = "nocorr",
                        scaling =list(list(var="X_num",type="none"),list(var="X_cat",type="none")) )

peterpc5j · Post by **peterpc5j** » Thu May 07, 2020 1:41 pm

Thanks for the reply, that solves the issue indeed! I never thought that a linear reparametrization would generate so large a difference in model fit, that is quite instructive.
Inversely, I managed to reproduce the original GAMLj result using lme4 with the following code in R, in which the predictor variables are centered before fitting the model:

Code: Select all

dataset$X_num_c <- dataset$X_num-mean(dataset$X_num)
dataset$X_cat_c <- dataset$X_cat-mean(dataset$X_cat)
formula_str_c <- "Y ~ 1 + X_num_c + X_cat_c + X_num_c:X_cat_c + (0+X_num_c|S) + (0+X_cat_c|S) + (0+X_num_c:X_cat_c|S) + (1|S)"
lmer_res_bobyqa_ML_c <- lmer(formula = formula_str_c,  data=dataset, control=lmerControl(optimizer='bobyqa'), REML=FALSE)
print(summary(lmer_res_bobyqa_ML_c)$AICtab)

mcfanda@gmail.com · Post by **mcfanda@gmail.com** » Fri May 08, 2020 5:36 pm

Glad it worked

jamovi