Ashar 2022 LME example

We'll fit the Ashar dataset with fitlme, Matlab's mixed effects model
This is Matlab's object-oriented equivalent to lmer in R. It fits a variety of mixed effects models. A list of methods and properties are here:
help LinearMixedModel
LinearMixedModel Fitted linear mixed effects model. LME = FITLME(...) fits a linear mixed effects model to data. The fitted model LME is a LinearMixedModel modeling a response variable as a linear function of fixed effect predictors and random effect predictors. LinearMixedModel methods: coefCI - Coefficient confidence intervals coefTest - Linear hypothesis test on coefficients predict - Compute predicted values given predictor values random - Generate random response values given predictor values partialDependence - Partial Dependence values. plotPartialDependence - Plot Partial Dependence. plotResiduals - Plot of residuals designMatrix - Fixed and random effects design matrices fixedEffects - Stats on fixed effects randomEffects - Stats on random effects covarianceParameters - Stats on covariance parameters fitted - Fitted response residuals - Various types of residuals response - Response used to fit the model compare - Compare fitted models anova - Marginal tests for fixed effect terms disp - Display a fitted model LinearMixedModel properties: FitMethod - Method used for fitting (either 'ML' or 'REML') MSE - Mean squared error (estimate of residual variance) Formula - Representation of the model used in this fit LogLikelihood - Log of likelihood function at coefficient estimates DFE - Degrees of freedom for residuals SSE - Error sum of squares SST - Total sum of squares SSR - Regression sum of squares CoefficientCovariance - Covariance matrix for coefficient estimates CoefficientNames - Coefficient names NumCoefficients - Number of coefficients NumEstimatedCoefficients - Number of estimated coefficients Coefficients - Coefficients and related statistics Rsquared - R-squared and adjusted R-squared ModelCriterion - AIC and other model criteria VariableInfo - Information about variables used in the fit ObservationInfo - Information about observations used in the fit Variables - Table of variables used in fit NumVariables - Number of variables used in fit VariableNames - Names of variables used in fit NumPredictors - Number of predictors PredictorNames - Names of predictors ResponseName - Name of response NumObservations - Number of observations in the fit ObservationNames - Names of observations in the fit See also fitlme, fitlmematrix, LinearModel, GeneralizedLinearModel, NonLinearModel. Documentation for LinearMixedModel
These models, like many other statistics objects implementing linear models, allow you to specify model formulas using Wilkinson notation. There is also a matrix input version.

Load the dataset

This dataset is from:
Ashar et al. 2022, JAMA Psychiatry
This study is a randomized clinical trial (RCT) of patients with chronic back pain (>= 4/10 pain for at least 3 months, average duration ~10 years). The trial compares Pain Reprocessing Therapy, Placebo (a single injection into the skin of the back), and Usual Care treatments, with pre-treatment and post-treatment (1 month after treatment initiation), and follow-up to 1 year.
backpain = readtable('Ashar_backpain_simple_long.csv');
backpain
backpain = 341×6 table
 idtimegroupbpi_intensitygenderis_patient
112132.500011
212232.750011
314132.500021
41423221
515132.250011
615232.750011
718124.500011
818222.250011
923122.500021
1023222.250021
1124114.500011
1224210.500011
1329113.750021
1429210.250021
1532115.750011
1632210.250011
1738115.500021
1838211.750021
195112411
2051223.750011
2161124.500021
2261221.250021
2367115.250021
2469121.250011
2569220.500011
267213411
2772234.500011
2879134.500021
2979233.250021
308512111
3187124.500021
3287222.250021
3396126.250021
3496226.750021
35102115.750011
3610221311
37103123.750021
38103222.750021
3910413221
40104231.750021
41108132.250021
4210823221
4311211211
4411221111
45113115.250021
46113213.500021
47115113.750021
48115210.750021
49117115.250021
5011721421
5112512321
52125222.500021
5314211211
54142210.500011
5514612321
56146223.750021
57147134.750021
5814723221
59148123.500011
6014822211
61165123.750011
62172123.750011
63172225.500011
64192134.750021
6519223521
66194133.250021
6719423321
68195112.750021
6919521021
70219134.500011
7122112111
7222122111
73228132.500011
74228231.250011
75248125.250011
76248224.500011
77262113.250021
78262210.750021
79295133.250021
80295234.250021
81312113.250011
82312210.750011
8331911511
84319210.250011
85324115.500011
86324211.750011
87330134.500011
8833023211
89339134.500021
9034112411
91341222.500011
92348133.500021
9334823221
94352135.250021
95364135.500021
96364231.750021
9737811321
9837821121
9938913321
10038923421
Remember: You need the dataset on your Matlab path for this to work.

Fit the model

FIt a model testing the Group x Time (Post - Pre-treatment) interaction.
lme = fitlme(backpain,'bpi_intensity ~ group*time + (time | id)', 'FitMethod','REML');
disp(lme);
Linear mixed-effects model fit by REML Model information: Number of observations 286 Fixed effects coefficients 4 Random effects coefficients 302 Covariance parameters 4 Formula: bpi_intensity ~ 1 + time*group + (1 + time | id) Model fit statistics: AIC BIC LogLikelihood Deviance 958.91 988.05 -471.46 942.91 Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper {'(Intercept)'} 7.4115 0.48139 15.396 282 3.0697e-39 6.4639 8.3591 {'time' } -3.4186 0.29129 -11.736 282 3.6424e-26 -3.992 -2.8452 {'group' } -1.1237 0.22222 -5.0569 282 7.6861e-07 -1.5611 -0.68631 {'time:group' } 1.0362 0.1337 7.7504 282 1.6691e-13 0.77305 1.2994 Random effects covariance parameters (95% CIs): Group: id (151 Levels) Name1 Name2 Type Estimate Lower Upper {'(Intercept)'} {'(Intercept)'} {'std' } 0.97441 NaN NaN {'time' } {'(Intercept)'} {'corr'} -0.14395 NaN NaN {'time' } {'time' } {'std' } 0.33244 NaN NaN Group: Error Name Estimate Lower Upper {'Res Std'} 0.87779 NaN NaN

Diagnose problems!

There are few things wrong here. What are they?
It's a great idea to do "sanity checks" to understand what the model is doing and what might be going wrong.
Hints:
  1. Look at the dfe. Does it look right?
  2. Look at the design matrix and the number of fixed effects. Is it as intended?
  3. Does it makes sense to include healthy individuals scanned only once? Are they being included?
  4. Are we effectively using the "difference score" method here? Is there another way?

Issue 1: Fixed effects dfe

The dfe for the fixed effects is very large - much larger than the number of subjects, and in fact comparable to the total number of observations.
The fixed effects table does not correct the dfe for effective degrees of freedom, and does not therefore account for reduction in dfe with correlated errors due to random effects!
To estimate the dfe, we can used the Satterthwaite approximation.
[~, ~, fixedstats] = fixedEffects(lme, 'DFMethod', 'Satterthwaite');
disp(fixedstats)
Fixed effect coefficients: DFMethod = 'Satterthwaite', Alpha = 0.05 Name Estimate SE tStat DF pValue Lower Upper {'(Intercept)'} 7.4115 0.48139 15.396 145.67 2.331e-32 6.4601 8.3629 {'time' } -3.4186 0.29129 -11.736 136.4 2.0222e-22 -3.9946 -2.8426 {'group' } -1.1237 0.22222 -5.0569 144.41 1.273e-06 -1.5629 -0.6845 {'time:group' } 1.0362 0.1337 7.7504 135.71 1.9285e-12 0.77182 1.3006
 
% Contrasts: Simple (trivial) contrast for the Group x Time interaction
disp('Group x Time')
Group x Time
[PVAL,F,DF1,DF2] = coefTest(lme, [0 0 0 1], 0, 'DFMethod', 'Satterthwaite')
PVAL = 1.9285e-12
F = 60.0683
DF1 = 1
DF2 = 135.7109
 

Issue 2: Mis-coding a predictor

Look at the design matrix.
X = designMatrix(lme, 'Fixed');
 
X_table = array2table(X, 'VariableNames', lme.CoefficientNames);
X_table
X_table = 341×4 table
 (Intercept)timegrouptime:group
11133
21236
31133
41236
51133
61236
71122
81224
91122
101224
111111
121212
131111
141212
151111
161212
171111
181212
191122
201224
211122
221224
231111
241122
251224
261133
271236
281133
291236
301122
311122
321224
331122
341224
351111
361212
371122
381224
391133
401236
411133
421236
431111
441212
451111
461212
471111
481212
491111
501212
511122
521224
531111
541212
551122
561224
571133
581236
591122
601224
611122
621122
631224
641133
651236
661133
671236
681111
691212
701133
711122
721224
731133
741236
751122
761224
771111
781212
791133
801236
811111
821212
831111
841212
851111
861212
871133
881236
891133
901122
911224
921133
931236
941133
951133
961236
971111
981212
991133
1001236
There is only 1 column for Group, but we know we need 2 columns to span the space of differences among 3 groups!
Group is coded with numbers, and is being treated as a numeric variable. It is looking for linear increases (or decreases) in pain as function of the numeric code! If we want lme to treat Group as a categoricall variable, we need to recast it as a categorical variable class.
backpain.group = categorical(backpain.group);
Now we can re-run the model:
lme = fitlme(backpain,'bpi_intensity ~ group*time + (time | id)', 'FitMethod','REML');
disp(lme);
Linear mixed-effects model fit by REML Model information: Number of observations 286 Fixed effects coefficients 6 Random effects coefficients 302 Covariance parameters 4 Formula: bpi_intensity ~ 1 + time*group + (1 + time | id) Model fit statistics: AIC BIC LogLikelihood Deviance 952.38 988.73 -466.19 932.38 Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper {'(Intercept)' } 6.598 0.31156 21.177 280 4.434e-60 5.9847 7.2113 {'time' } -2.628 0.18597 -14.131 280 1.3627e-34 -2.9941 -2.2619 {'group_2' } -2.0483 0.4391 -4.6647 280 4.7925e-06 -2.9127 -1.1839 {'group_3' } -2.2628 0.4385 -5.1603 280 4.6771e-07 -3.126 -1.3996 {'time:group_2'} 1.7695 0.2628 6.7331 280 9.3944e-11 1.2521 2.2868 {'time:group_3'} 2.0878 0.25945 8.047 280 2.4295e-14 1.5771 2.5986 Random effects covariance parameters (95% CIs): Group: id (151 Levels) Name1 Name2 Type Estimate Lower Upper {'(Intercept)'} {'(Intercept)'} {'std' } 0.94359 3.3893e-16 2.627e+15 {'time' } {'(Intercept)'} {'corr'} 0.14217 -1 1 {'time' } {'time' } {'std' } 0.19832 2.7884e-141 1.4105e+139 Group: Error Name Estimate Lower Upper {'Res Std'} 0.87048 0.00020409 3712.8
The Group codes are rather awkward. Group 1 is a reference group, so the positive effect of time:group_2 means that there was a greater increase in pain over time in Group 2 (Placebo) than Group 1 (PRT, treatment).
How do we do an F-test for Group overall? The anova() method will do it:
anova(lme, 'DFMethod', 'Satterthwaite')
ans =
ANOVA marginal tests: DFMethod = 'Satterthwaite' Term FStat DF1 DF2 pValue {'(Intercept)'} 448.47 1 145.69 2.5654e-46 {'time' } 199.69 1 135.92 1.8529e-28 {'group' } 16.168 2 144.82 4.5931e-07 {'time:group' } 37.109 2 135.44 1.4098e-13

Issue 3: The difference score method

The effect of Time here is the equivalent of the Post vs. Pre difference score method we used earlier, when we created contrasts across the within-person effect of time.
This kind of subtraction can be super useful, as there may be a lot of individual variability in overall pain scores, and by collecting a baseline score we can "subtract it away", reducing error. And it is the improvement we are interested in, after all.
The subtraction method is widely used -- for example, subtracting high vs. low cognitive demand to yield a difference score. The Stroop, Simon, Flanker, Go/No-go, IAT, and many more tests use this method.
There are a couple of problems with this:
% For this the "wide" format is more convenient
backpainwide = readtable('Ashar_backpain_simple_wide.csv');
 
backpainwide.group = categorical(backpainwide.group);
 
lme = fitlme(backpainwide,'BPI_Time2 ~ group + BPI_Time1');
disp(lme);
Linear mixed-effects model fit by ML Model information: Number of observations 135 Fixed effects coefficients 4 Random effects coefficients 0 Covariance parameters 1 Formula: BPI_Time2 ~ 1 + group + BPI_Time1 Model fit statistics: AIC BIC LogLikelihood Deviance 423.39 437.92 -206.7 413.39 Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper {'(Intercept)'} -1.0502 0.3473 -3.0238 131 0.0030043 -1.7372 -0.36312 {'group_2' } 1.6587 0.23937 6.9292 131 1.7252e-10 1.1851 2.1322 {'group_3' } 2.0183 0.23594 8.554 131 2.7305e-14 1.5515 2.485 {'BPI_Time1' } 0.60256 0.074734 8.0627 131 4.1045e-13 0.45472 0.7504 Random effects covariance parameters (95% CIs): Group: Error Name Estimate Lower Upper {'Res Std'} 1.1187 0.99288 1.2604
 
anova(lme, 'DFMethod', 'Satterthwaite')
ans =
ANOVA marginal tests: DFMethod = 'Satterthwaite' Term FStat DF1 DF2 pValue {'(Intercept)'} 9.1433 1 135 0.0029886 {'group' } 40.98 2 135 1.2354e-14 {'BPI_Time1' } 65.007 1 135 3.5533e-13
 
Of course, now I have one vector of independent errors (they are not stacked across time points within-person), so there are no random effects. Or, we should say, errors are not grouped by a random factor (participant). So we do not need a mixed effects model to do this: Any OLS GLM will work.