Sleepstudy dataset example
We'll fit the "sleepstudy" dataset, a standard demo dataset in R, with three different mixed effects models:
- fitlme, Matlab's mixed effects model
- IGLS, CANlab's Iterative Generalized Least Squares 2-level model
- glmfit_multilevel, CANlab's two-stage reweighted summary statistics model
Dataset description
Description
The average reaction time per day (in milliseconds) for subjects in a sleep deprivation study.
Days 0-1 were adaptation and training (T1/T2), day 2 was baseline (B); sleep deprivation started after day 2.
Format
A data frame with 180 observations on the following 3 variables.
Reaction: Average reaction time (ms)
Days: Number of days of sleep deprivation
Subject: Subject number on which the observation was made.
Details
These data are from the study described in Belenky et al. (2003), for the most sleep-deprived group (3 hours time-in-bed) and for the first 10 days of the study, up to the recovery period. The original study analyzed speed (1/(reaction time)) and treated day as a categorical rather than a continuous predictor.
References
Gregory Belenky, Nancy J. Wesensten, David R. Thorne, Maria L. Thomas, Helen C. Sing, Daniel P. Redmond, Michael B. Russo and Thomas J. Balkin (2003) Patterns of performance degradation and restoration during sleep restriction and subsequent recovery: a sleep dose-response study. Journal of Sleep Research 12, 1--12.
Load the dataset
Before you start, you'll want the dataset on your path:
sleepstudy = readtable('sleepstudy.csv');
fitlme
lme = fitlme(sleepstudy,'Reaction ~ Days + (Days | Subject)');
disp(lme)
Linear mixed-effects model fit by ML
Model information:
Number of observations 180
Fixed effects coefficients 2
Random effects coefficients 36
Covariance parameters 4
Formula:
Reaction ~ 1 + Days + (1 + Days | Subject)
Model fit statistics:
AIC BIC LogLikelihood Deviance
1763.9 1783.1 -875.97 1751.9
Fixed effects coefficients (95% CIs):
Name Estimate SE tStat DF pValue Lower Upper
{'(Intercept)'} 251.41 6.6323 37.906 178 3.6634e-87 238.32 264.49
{'Days' } 10.467 1.5022 6.9678 178 6.0243e-11 7.5028 13.432
Random effects covariance parameters (95% CIs):
Group: Subject (18 Levels)
Name1 Name2 Type Estimate Lower Upper
{'(Intercept)'} {'(Intercept)'} {'std' } 23.781 15.017 37.658
{'Days' } {'(Intercept)'} {'corr'} 0.08132 -0.50558 0.61677
{'Days' } {'Days' } {'std' } 5.7168 3.8055 8.5882
Group: Error
Name Estimate Lower Upper
{'Res Std'} 25.592 22.8 28.725
Note: The degrees of freedom and P-values in the fixed effects table are wrong!
For valid estimates of the dfe, you must correct the df using the Satterthwaite method, or do model comparisons to test significance levels. I prefer the former because it's straightforward.
[~, ~, fixedstats] = fixedEffects(lme, 'DFMethod', 'Satterthwaite');
disp(fixedstats)
Fixed effect coefficients: DFMethod = 'Satterthwaite', Alpha = 0.05
Name Estimate SE tStat DF pValue Lower Upper
{'(Intercept)'} 251.41 6.6323 37.906 18 1.2668e-18 237.47 265.34
{'Days' } 10.467 1.5022 6.9678 18 1.6524e-06 7.3112 13.623
The default is maximum likelihood estimation, so let's try it using ReML instead:
lme = fitlme(sleepstudy,'Reaction ~ Days + (Days | Subject)', 'FitMethod','REML');
disp(lme)
Linear mixed-effects model fit by REML
Model information:
Number of observations 180
Fixed effects coefficients 2
Random effects coefficients 36
Covariance parameters 4
Formula:
Reaction ~ 1 + Days + (1 + Days | Subject)
Model fit statistics:
AIC BIC LogLikelihood Deviance
1755.6 1774.7 -871.81 1743.6
Fixed effects coefficients (95% CIs):
Name Estimate SE tStat DF pValue Lower Upper
{'(Intercept)'} 251.41 6.8246 36.838 178 3.3479e-85 237.94 264.87
{'Days' } 10.467 1.5458 6.7715 178 1.7872e-10 7.4169 13.518
Random effects covariance parameters (95% CIs):
Group: Subject (18 Levels)
Name1 Name2 Type Estimate Lower Upper
{'(Intercept)'} {'(Intercept)'} {'std' } 24.74 15.582 39.283
{'Days' } {'(Intercept)'} {'corr'} 0.065551 -0.51839 0.6078
{'Days' } {'Days' } {'std' } 5.9221 3.9183 8.9508
Group: Error
Name Estimate Lower Upper
{'Res Std'} 25.592 22.8 28.725
[~, ~, fixedstats] = fixedEffects(lme, 'DFMethod', 'Satterthwaite');
disp(fixedstats)
Fixed effect coefficients: DFMethod = 'Satterthwaite', Alpha = 0.05
Name Estimate SE tStat DF pValue Lower Upper
{'(Intercept)'} 251.41 6.8246 36.838 17 1.1709e-17 237.01 265.8
{'Days' } 10.467 1.5458 6.7715 17 3.2638e-06 7.206 13.729
Let's get the BLUPs, the Best Linear Unbiased Predictors.
- These are estimates of the individual parameters for each person, with shrinkage of high-variance estimates towards the group values.
- They are deviations around the group (population) values, so we have to add the population parameters to get the individual slopes.
- These are what Doug Bates calls conditional means, E[beta|Y=y], evaluated at the estimated parameters
[~, ~, randomstats] = randomEffects(lme);
blups = reshape(randomstats.Estimate, 2, length(unique(randomstats.Level)))';
levelnames = unique(randomstats.Level);
bnames = unique(randomstats.Name, 'stable')
blups = array2table(blups, 'Variablenames', bnames, 'RowNames', levelnames )
blups = 18×2 table
| (Intercept) | Days |
---|
1 308 | 2.2586 | 9.1990 |
---|
2 309 | -40.3986 | -8.6197 |
---|
3 310 | -38.9603 | -5.4489 |
---|
4 330 | 23.6905 | -4.8143 |
---|
5 331 | 22.2602 | -3.0699 |
---|
6 332 | 9.0395 | -0.2722 |
---|
7 333 | 16.8404 | -0.2236 |
---|
8 334 | -7.2326 | 1.0746 |
---|
9 335 | -0.3337 | -10.7522 |
---|
10 337 | 34.8904 | 8.6283 |
---|
11 349 | -25.2101 | 1.1734 |
---|
12 350 | -13.0700 | 6.6142 |
---|
13 351 | 4.5778 | -3.0153 |
---|
14 352 | 20.8636 | 3.5360 |
---|
15 369 | 3.2755 | 0.8722 |
---|
16 370 | -25.6129 | 4.8225 |
---|
17 371 | 0.8070 | -0.9882 |
---|
18 372 | 12.3145 | 1.2840 |
---|
% ---------------------------
indiv_slopes = table2array(blups) + lme.Coefficients.Estimate';
days = length(unique(sleepstudy.Days));
% This is a local function created to show a simple plot of slopes
plot_indiv_slopes(indiv_slopes, days);
IGLS: Iterative generalized least squares
% To use IGLS and glmfit_multilevel, we need to reformat the dataset to
sleepunstacked = unstack(sleepstudy,'Reaction','Days');
Warning: Table variable names that were not valid MATLAB identifiers have been modified. Since table variable names must be unique, any table variable names that happened to match the new identifiers also have been modified.
To use the original INDVAR values as table variable names, set 'VariableNamingRule' to 'preserve'.
sleepunstacked = table2array(sleepunstacked(:, 2:end))';
[n, m] = size(sleepunstacked); % n = days, m = subjects
sleep_rt_cell = mat2cell(sleepunstacked, n, ones(1, m));
X1 = repmat({daycode}, 1, m);
% IGLS requires CANlab tools
% see also rigls.m for ReML estimation
igls_stats = igls(sleep_rt_cell, X1, 'plot', 'all');
MST (diag of Ystar_mtx): 2251.39771
MSE (diag of resid): -0.00001


igls.m output report:
---------------------------------------
Data: 10 observations x 18 subjects
Fit Type: igls
AR(p) model: No
Converged: Yes
Max iterations: 5, Completed iterations: 1
Epsilon for convergence: 0.010000
Elapsed time: 0.29 s
Statistics: Tests of inference on fixed population parameters
Parameter est. t( 17) p
Intcpt. 251.405 39.55 0.000000 ***
Slope1 10.467 7.06 0.000002 ***
Statistics: Tests of significance on random effects (variances)
Parameter est. LRT p
Intcpt. 497.258 22.09 0.000000 ***
Slope1 31.449 42.01 0.000000 ***
---------------------------------------
% Create the same simple plot as before, to compare
indiv_slopes = igls_stats.beta_indiv';
plot_indiv_slopes(indiv_slopes, days);
Here is a different plot, with different color options.
This plots the naive estimates of individual slopes, not the BLUPs. But you can see that they are very close to the BLUPS in this case.
This requires CANlab tools)
create_figure('example');
handles = line_plot_multisubject(X1, sleep_rt_cell, 'colors', seaborn_colors(length(X1)));
Warnings:
____________________________________________________________________________________________________________________________________________
Y: input cells with low varability:
2
____________________________________________________________________________________________________________________________________________
____________________________________________________________________________________________________________________________________________
Input data:
X scaling: No centering or z-scoring
Y scaling: No centering or z-scoring
Transformations:
No data transformations before plot
Correlations:
r = 0.54 across all observations, based on untransformed input data
Stats on slopes after transformation, subject is random effect:
Mean b = 10.47, t( 17) = 6.77, p = 0.000003, num. missing: 0
Average within-person r = 0.71 +- 0.38 (std)
Between-person r (across subject means) = NaN, p = NaN
____________________________________________________________________________________________________________________________________________
set(handles.point_handles, 'MarkerSize', 8, 'Marker', 'o');
set(gca, 'FontSize', 18);
xlabel('Days of sleep deprivation'); ylabel('Reaction Time (RT)');
glmfit_multilevel
glmfit_stats = glmfit_multilevel(sleep_rt_cell, X1, [], 'names', {'Days'}, 'verbose', 'weighted', 'plot');
Analysis description: Second Level of Multilevel Model
GLS analysis
Observations: 18, Predictors: 1
Outcome names: Intercept Days
Weighting option: weighted
Inference option: t-test
Other Options:
Plots: Yes
Robust: no
Save: No
Output file: glmfit_general_output.txt
R & B - type Empirical Bayes reweighting.
Total time: 0.04 s
________________________________________
Second Level of Multilevel Model
Outcome variables:
Intercept Days
Adj. mean 250.61 10.11
2nd-level B01
Intercept Days
Coeff 250.61 10.11
STE 7.14 1.55
t 35.09 6.53
Z 8.21 4.54
p 0.00000 0.00001
________________________________________


function plot_indiv_slopes(indiv_slopes, days)
m = size(indiv_slopes, 1); % number of subjects
colors = seaborn_colors(m);
plot([0 days], [indiv_slopes(i, 1) indiv_slopes(i, 1) + days * indiv_slopes(i, 2)], 'color', colors{i}, 'LineWidth', 2);
% b = lme.Coefficients.Estimate;
% plot([0 days], [b(1) b(1) + days * b(2)], 'color', 'k', 'LineWidth', 4);
set(gca, 'FontSize', 18);
xlabel('Days of sleep deprivation'); ylabel('Reaction Time (RT)');