Sleepstudy dataset example

We'll fit the "sleepstudy" dataset, a standard demo dataset in R, with three different mixed effects models:
Table of Contents

Dataset description

Description

The average reaction time per day (in milliseconds) for subjects in a sleep deprivation study.
Days 0-1 were adaptation and training (T1/T2), day 2 was baseline (B); sleep deprivation started after day 2.

Format

A data frame with 180 observations on the following 3 variables.
Reaction: Average reaction time (ms)
Days: Number of days of sleep deprivation
Subject: Subject number on which the observation was made.
Details
These data are from the study described in Belenky et al. (2003), for the most sleep-deprived group (3 hours time-in-bed) and for the first 10 days of the study, up to the recovery period. The original study analyzed speed (1/(reaction time)) and treated day as a categorical rather than a continuous predictor.
References
Gregory Belenky, Nancy J. Wesensten, David R. Thorne, Maria L. Thomas, Helen C. Sing, Daniel P. Redmond, Michael B. Russo and Thomas J. Balkin (2003) Patterns of performance degradation and restoration during sleep restriction and subsequent recovery: a sleep dose-response study. Journal of Sleep Research 12, 1--12.

Load the dataset

Before you start, you'll want the dataset on your path:
basedir = '/Users/f003vz1/Dropbox (Dartmouth College)/COURSES/Courses_Dartmouth/2023_Fall_Computational_Foundations';
g = genpath(fullfile(basedir));
addpath(g);
sleepstudy = readtable('sleepstudy.csv');

fitlme

lme = fitlme(sleepstudy,'Reaction ~ Days + (Days | Subject)');
disp(lme)
Linear mixed-effects model fit by ML Model information: Number of observations 180 Fixed effects coefficients 2 Random effects coefficients 36 Covariance parameters 4 Formula: Reaction ~ 1 + Days + (1 + Days | Subject) Model fit statistics: AIC BIC LogLikelihood Deviance 1763.9 1783.1 -875.97 1751.9 Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper {'(Intercept)'} 251.41 6.6323 37.906 178 3.6634e-87 238.32 264.49 {'Days' } 10.467 1.5022 6.9678 178 6.0243e-11 7.5028 13.432 Random effects covariance parameters (95% CIs): Group: Subject (18 Levels) Name1 Name2 Type Estimate Lower Upper {'(Intercept)'} {'(Intercept)'} {'std' } 23.781 15.017 37.658 {'Days' } {'(Intercept)'} {'corr'} 0.08132 -0.50558 0.61677 {'Days' } {'Days' } {'std' } 5.7168 3.8055 8.5882 Group: Error Name Estimate Lower Upper {'Res Std'} 25.592 22.8 28.725
Note: The degrees of freedom and P-values in the fixed effects table are wrong!
For valid estimates of the dfe, you must correct the df using the Satterthwaite method, or do model comparisons to test significance levels. I prefer the former because it's straightforward.
[~, ~, fixedstats] = fixedEffects(lme, 'DFMethod', 'Satterthwaite');
disp(fixedstats)
Fixed effect coefficients: DFMethod = 'Satterthwaite', Alpha = 0.05 Name Estimate SE tStat DF pValue Lower Upper {'(Intercept)'} 251.41 6.6323 37.906 18 1.2668e-18 237.47 265.34 {'Days' } 10.467 1.5022 6.9678 18 1.6524e-06 7.3112 13.623
 
% This is another way to do it:
anova(lme, 'DFMethod', 'Satterthwaite')
The default is maximum likelihood estimation, so let's try it using ReML instead:
lme = fitlme(sleepstudy,'Reaction ~ Days + (Days | Subject)', 'FitMethod','REML');
disp(lme)
Linear mixed-effects model fit by REML Model information: Number of observations 180 Fixed effects coefficients 2 Random effects coefficients 36 Covariance parameters 4 Formula: Reaction ~ 1 + Days + (1 + Days | Subject) Model fit statistics: AIC BIC LogLikelihood Deviance 1755.6 1774.7 -871.81 1743.6 Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper {'(Intercept)'} 251.41 6.8246 36.838 178 3.3479e-85 237.94 264.87 {'Days' } 10.467 1.5458 6.7715 178 1.7872e-10 7.4169 13.518 Random effects covariance parameters (95% CIs): Group: Subject (18 Levels) Name1 Name2 Type Estimate Lower Upper {'(Intercept)'} {'(Intercept)'} {'std' } 24.74 15.582 39.283 {'Days' } {'(Intercept)'} {'corr'} 0.065551 -0.51839 0.6078 {'Days' } {'Days' } {'std' } 5.9221 3.9183 8.9508 Group: Error Name Estimate Lower Upper {'Res Std'} 25.592 22.8 28.725
 
[~, ~, fixedstats] = fixedEffects(lme, 'DFMethod', 'Satterthwaite');
disp(fixedstats)
Fixed effect coefficients: DFMethod = 'Satterthwaite', Alpha = 0.05 Name Estimate SE tStat DF pValue Lower Upper {'(Intercept)'} 251.41 6.8246 36.838 17 1.1709e-17 237.01 265.8 {'Days' } 10.467 1.5458 6.7715 17 3.2638e-06 7.206 13.729
 
Let's get the BLUPs, the Best Linear Unbiased Predictors.
[~, ~, randomstats] = randomEffects(lme);
blups = reshape(randomstats.Estimate, 2, length(unique(randomstats.Level)))';
levelnames = unique(randomstats.Level);
bnames = unique(randomstats.Name, 'stable')
bnames = 2×1 cell
'(Intercept)'
'Days'
blups = array2table(blups, 'Variablenames', bnames, 'RowNames', levelnames )
blups = 18×2 table
 (Intercept)Days
1 3082.25869.1990
2 309-40.3986-8.6197
3 310-38.9603-5.4489
4 33023.6905-4.8143
5 33122.2602-3.0699
6 3329.0395-0.2722
7 33316.8404-0.2236
8 334-7.23261.0746
9 335-0.3337-10.7522
10 33734.89048.6283
11 349-25.21011.1734
12 350-13.07006.6142
13 3514.5778-3.0153
14 35220.86363.5360
15 3693.27550.8722
16 370-25.61294.8225
17 3710.8070-0.9882
18 37212.31451.2840
 
% Now create the plot
% ---------------------------
indiv_slopes = table2array(blups) + lme.Coefficients.Estimate';
days = length(unique(sleepstudy.Days));
 
% This is a local function created to show a simple plot of slopes
plot_indiv_slopes(indiv_slopes, days);
 

IGLS: Iterative generalized least squares

% To use IGLS and glmfit_multilevel, we need to reformat the dataset to
% cell arrays
 
sleepunstacked = unstack(sleepstudy,'Reaction','Days');
Warning: Table variable names that were not valid MATLAB identifiers have been modified. Since table variable names must be unique, any table variable names that happened to match the new identifiers also have been modified.
To use the original INDVAR values as table variable names, set 'VariableNamingRule' to 'preserve'.
sleepunstacked = table2array(sleepunstacked(:, 2:end))';
[n, m] = size(sleepunstacked); % n = days, m = subjects
sleep_rt_cell = mat2cell(sleepunstacked, n, ones(1, m));
 
daycode = ((1:n)') - 1 ;
X1 = repmat({daycode}, 1, m);
 
% IGLS requires CANlab tools
% see also rigls.m for ReML estimation
 
igls_stats = igls(sleep_rt_cell, X1, 'plot', 'all');
MST (diag of Ystar_mtx): 2251.39771 MSE (diag of resid): -0.00001
igls.m output report: --------------------------------------- Data: 10 observations x 18 subjects Fit Type: igls AR(p) model: No Converged: Yes Max iterations: 5, Completed iterations: 1 Epsilon for convergence: 0.010000 Elapsed time: 0.29 s Statistics: Tests of inference on fixed population parameters Parameter est. t( 17) p Intcpt. 251.405 39.55 0.000000 *** Slope1 10.467 7.06 0.000002 *** Statistics: Tests of significance on random effects (variances) Parameter est. LRT p Intcpt. 497.258 22.09 0.000000 *** Slope1 31.449 42.01 0.000000 *** ---------------------------------------
 
% BLUPs
% Create the same simple plot as before, to compare
indiv_slopes = igls_stats.beta_indiv';
 
plot_indiv_slopes(indiv_slopes, days);
 
Here is a different plot, with different color options.
This plots the naive estimates of individual slopes, not the BLUPs. But you can see that they are very close to the BLUPS in this case.
This requires CANlab tools)
create_figure('example');
handles = line_plot_multisubject(X1, sleep_rt_cell, 'colors', seaborn_colors(length(X1)));
Warnings: ____________________________________________________________________________________________________________________________________________ Y: input cells with low varability: 2 ____________________________________________________________________________________________________________________________________________ ____________________________________________________________________________________________________________________________________________ Input data: X scaling: No centering or z-scoring Y scaling: No centering or z-scoring Transformations: No data transformations before plot Correlations: r = 0.54 across all observations, based on untransformed input data Stats on slopes after transformation, subject is random effect: Mean b = 10.47, t( 17) = 6.77, p = 0.000003, num. missing: 0 Average within-person r = 0.71 +- 0.38 (std) Between-person r (across subject means) = NaN, p = NaN ____________________________________________________________________________________________________________________________________________
set(handles.point_handles, 'MarkerSize', 8, 'Marker', 'o');
set(gca, 'FontSize', 18);
xlabel('Days of sleep deprivation'); ylabel('Reaction Time (RT)');

glmfit_multilevel

glmfit_stats = glmfit_multilevel(sleep_rt_cell, X1, [], 'names', {'Days'}, 'verbose', 'weighted', 'plot');
Analysis description: Second Level of Multilevel Model GLS analysis Observations: 18, Predictors: 1 Outcome names: Intercept Days Weighting option: weighted Inference option: t-test Other Options: Plots: Yes Robust: no Save: No Output file: glmfit_general_output.txt R & B - type Empirical Bayes reweighting. Total time: 0.04 s ________________________________________ Second Level of Multilevel Model Outcome variables: Intercept Days Adj. mean 250.61 10.11 2nd-level B01 Intercept Days Coeff 250.61 10.11 STE 7.14 1.55 t 35.09 6.53 Z 8.21 4.54 p 0.00000 0.00001 ________________________________________
function plot_indiv_slopes(indiv_slopes, days)
 
figure; hold on;
m = size(indiv_slopes, 1); % number of subjects
colors = seaborn_colors(m);
 
for i = 1:m
% plot individual line
plot([0 days], [indiv_slopes(i, 1) indiv_slopes(i, 1) + days * indiv_slopes(i, 2)], 'color', colors{i}, 'LineWidth', 2);
end
 
% plot group
% b = lme.Coefficients.Estimate;
% plot([0 days], [b(1) b(1) + days * b(2)], 'color', 'k', 'LineWidth', 4);
 
set(gca, 'FontSize', 18);
xlabel('Days of sleep deprivation'); ylabel('Reaction Time (RT)');
end