Sleepstudy dataset example

We'll fit the "sleepstudy" dataset, a standard demo dataset in R, with three different mixed effects models:

fitlme, Matlab's mixed effects model
IGLS, CANlab's Iterative Generalized Least Squares 2-level model
glmfit_multilevel, CANlab's two-stage reweighted summary statistics model

Table of Contents

Dataset description Description Format Load the dataset fitlme IGLS: Iterative generalized least squares glmfit_multilevel

Dataset description

Description

The average reaction time per day (in milliseconds) for subjects in a sleep deprivation study.

Days 0-1 were adaptation and training (T1/T2), day 2 was baseline (B); sleep deprivation started after day 2.

Format

A data frame with 180 observations on the following 3 variables.

Reaction: Average reaction time (ms)

Days: Number of days of sleep deprivation

Subject: Subject number on which the observation was made.

Details

These data are from the study described in Belenky et al. (2003), for the most sleep-deprived group (3 hours time-in-bed) and for the first 10 days of the study, up to the recovery period. The original study analyzed speed (1/(reaction time)) and treated day as a categorical rather than a continuous predictor.

References

Gregory Belenky, Nancy J. Wesensten, David R. Thorne, Maria L. Thomas, Helen C. Sing, Daniel P. Redmond, Michael B. Russo and Thomas J. Balkin (2003) Patterns of performance degradation and restoration during sleep restriction and subsequent recovery: a sleep dose-response study. Journal of Sleep Research 12, 1--12.

Load the dataset

Before you start, you'll want the dataset on your path:

basedir = '/Users/f003vz1/Dropbox (Dartmouth College)/COURSES/Courses_Dartmouth/2023_Fall_Computational_Foundations';

g = genpath(fullfile(basedir));

addpath(g);

sleepstudy = readtable('sleepstudy.csv');

fitlme

lme = fitlme(sleepstudy,'Reaction ~ Days + (Days | Subject)');
disp(lme)
Linear mixed-effects model fit by ML

Model information:
    Number of observations             180
    Fixed effects coefficients           2
    Random effects coefficients         36
    Covariance parameters                4

Formula:
    Reaction ~ 1 + Days + (1 + Days | Subject)

Model fit statistics:
    AIC       BIC       LogLikelihood    Deviance
    1763.9    1783.1    -875.97          1751.9  

Fixed effects coefficients (95% CIs):
    Name                   Estimate    SE        tStat     DF     pValue        Lower     Upper 
    {'(Intercept)'}        251.41      6.6323    37.906    178    3.6634e-87    238.32    264.49
    {'Days'       }        10.467      1.5022    6.9678    178    6.0243e-11    7.5028    13.432

Random effects covariance parameters (95% CIs):
Group: Subject (18 Levels)
    Name1                  Name2                  Type            Estimate    Lower       Upper  
    {'(Intercept)'}        {'(Intercept)'}        {'std' }         23.781       15.017     37.658
    {'Days'       }        {'(Intercept)'}        {'corr'}        0.08132     -0.50558    0.61677
    {'Days'       }        {'Days'       }        {'std' }         5.7168       3.8055     8.5882

Group: Error
    Name               Estimate    Lower    Upper 
    {'Res Std'}        25.592      22.8     28.725

Note: The degrees of freedom and P-values in the fixed effects table are wrong!

For valid estimates of the dfe, you must correct the df using the Satterthwaite method, or do model comparisons to test significance levels. I prefer the former because it's straightforward.

[~, ~, fixedstats] = fixedEffects(lme, 'DFMethod', 'Satterthwaite');
disp(fixedstats)
    Fixed effect coefficients: DFMethod = 'Satterthwaite', Alpha = 0.05

    Name                   Estimate    SE        tStat     DF    pValue        Lower     Upper 
    {'(Intercept)'}        251.41      6.6323    37.906    18    1.2668e-18    237.47    265.34
    {'Days'       }        10.467      1.5022    6.9678    18    1.6524e-06    7.3112    13.623
 

% This is another way to do it:

anova(lme, 'DFMethod', 'Satterthwaite')

The default is maximum likelihood estimation, so let's try it using ReML instead:

lme = fitlme(sleepstudy,'Reaction ~ Days + (Days | Subject)', 'FitMethod','REML');
disp(lme)
Linear mixed-effects model fit by REML

Model information:
    Number of observations             180
    Fixed effects coefficients           2
    Random effects coefficients         36
    Covariance parameters                4

Formula:
    Reaction ~ 1 + Days + (1 + Days | Subject)

Model fit statistics:
    AIC       BIC       LogLikelihood    Deviance
    1755.6    1774.7    -871.81          1743.6  

Fixed effects coefficients (95% CIs):
    Name                   Estimate    SE        tStat     DF     pValue        Lower     Upper 
    {'(Intercept)'}        251.41      6.8246    36.838    178    3.3479e-85    237.94    264.87
    {'Days'       }        10.467      1.5458    6.7715    178    1.7872e-10    7.4169    13.518

Random effects covariance parameters (95% CIs):
Group: Subject (18 Levels)
    Name1                  Name2                  Type            Estimate    Lower       Upper 
    {'(Intercept)'}        {'(Intercept)'}        {'std' }           24.74      15.582    39.283
    {'Days'       }        {'(Intercept)'}        {'corr'}        0.065551    -0.51839    0.6078
    {'Days'       }        {'Days'       }        {'std' }          5.9221      3.9183    8.9508

Group: Error
    Name               Estimate    Lower    Upper 
    {'Res Std'}        25.592      22.8     28.725
 
[~, ~, fixedstats] = fixedEffects(lme, 'DFMethod', 'Satterthwaite');
disp(fixedstats)
    Fixed effect coefficients: DFMethod = 'Satterthwaite', Alpha = 0.05

    Name                   Estimate    SE        tStat     DF    pValue        Lower     Upper 
    {'(Intercept)'}        251.41      6.8246    36.838    17    1.1709e-17    237.01     265.8
    {'Days'       }        10.467      1.5458    6.7715    17    3.2638e-06     7.206    13.729
 

Let's get the BLUPs, the Best Linear Unbiased Predictors.

These are estimates of the individual parameters for each person, with shrinkage of high-variance estimates towards the group values.
They are deviations around the group (population) values, so we have to add the population parameters to get the individual slopes.
These are what Doug Bates calls conditional means, E[beta|Y=y], evaluated at the estimated parameters

[~, ~, randomstats] = randomEffects(lme);

blups = reshape(randomstats.Estimate, 2, length(unique(randomstats.Level)))';

levelnames = unique(randomstats.Level);

bnames = unique(randomstats.Name, 'stable')

bnames = 2×1 cell

'(Intercept)'
'Days'       

blups = array2table(blups, 'Variablenames', bnames, 'RowNames', levelnames )

blups = 18×2 table 

	(Intercept)	Days
1 308	2.2586	9.1990
2 309	-40.3986	-8.6197
3 310	-38.9603	-5.4489
4 330	23.6905	-4.8143
5 331	22.2602	-3.0699
6 332	9.0395	-0.2722
7 333	16.8404	-0.2236
8 334	-7.2326	1.0746
9 335	-0.3337	-10.7522
10 337	34.8904	8.6283
11 349	-25.2101	1.1734
12 350	-13.0700	6.6142
13 351	4.5778	-3.0153
14 352	20.8636	3.5360
15 369	3.2755	0.8722
16 370	-25.6129	4.8225
17 371	0.8070	-0.9882
18 372	12.3145	1.2840

% Now create the plot

% ---------------------------

indiv_slopes = table2array(blups) + lme.Coefficients.Estimate';

days = length(unique(sleepstudy.Days));

% This is a local function created to show a simple plot of slopes

plot_indiv_slopes(indiv_slopes, days);

IGLS: Iterative generalized least squares

% To use IGLS and glmfit_multilevel, we need to reformat the dataset to
% cell arrays
 
sleepunstacked = unstack(sleepstudy,'Reaction','Days');
Warning: Table variable names that were not valid MATLAB identifiers have been modified. Since table variable names must be unique, any table variable names that happened to match the new identifiers also have been modified.
To use the original INDVAR values as table variable names, set 'VariableNamingRule' to 'preserve'.
sleepunstacked = table2array(sleepunstacked(:, 2:end))';
[n, m] = size(sleepunstacked);  % n = days, m  = subjects
sleep_rt_cell = mat2cell(sleepunstacked, n, ones(1, m));
 
daycode = ((1:n)') - 1 ;
X1 = repmat({daycode}, 1, m);
 

% IGLS requires CANlab tools

% see also rigls.m for ReML estimation

igls_stats = igls(sleep_rt_cell, X1, 'plot', 'all');

MST (diag of Ystar_mtx): 2251.39771
MSE (diag of resid): -0.00001

igls.m output report:
---------------------------------------
Data:  10 observations x  18 subjects 
Fit Type: igls
AR(p) model: No
Converged: Yes
Max iterations:   5, Completed iterations:   1
Epsilon for convergence: 0.010000
Elapsed time: 0.29 s

Statistics: Tests of inference on fixed population parameters
Parameter	est.	t( 17)	p	
Intcpt.	251.405	39.55	0.000000	***
Slope1	10.467	7.06	0.000002	***


Statistics: Tests of significance on random effects (variances)
Parameter	est.	LRT	p	
Intcpt.	497.258	22.09	0.000000	***
Slope1	31.449	42.01	0.000000	***

---------------------------------------

% BLUPs

% Create the same simple plot as before, to compare

indiv_slopes = igls_stats.beta_indiv';

plot_indiv_slopes(indiv_slopes, days);

Here is a different plot, with different color options.

This plots the naive estimates of individual slopes, not the BLUPs. But you can see that they are very close to the BLUPS in this case.

This requires CANlab tools)

create_figure('example');

handles = line_plot_multisubject(X1, sleep_rt_cell, 'colors', seaborn_colors(length(X1)));

Warnings:

____________________________________________________________________________________________________________________________________________
Y: input cells with low varability:
2
____________________________________________________________________________________________________________________________________________

____________________________________________________________________________________________________________________________________________
Input data:
X scaling: No centering or z-scoring
Y scaling: No centering or z-scoring
                
Transformations:
No data transformations before plot
                                                                   
Correlations:                                                      
r = 0.54 across all observations, based on untransformed input data
                                                                
Stats on slopes after transformation, subject is random effect: 
Mean b = 10.47, t( 17) = 6.77, p = 0.000003, num. missing:   0  
                                                                
Average within-person r = 0.71 +- 0.38 (std)
 
Between-person r (across subject means) = NaN, p = NaN
____________________________________________________________________________________________________________________________________________

set(handles.point_handles, 'MarkerSize', 8, 'Marker', 'o');

set(gca, 'FontSize', 18);

xlabel('Days of sleep deprivation'); ylabel('Reaction Time (RT)');

glmfit_multilevel

glmfit_stats = glmfit_multilevel(sleep_rt_cell, X1, [], 'names', {'Days'}, 'verbose', 'weighted', 'plot');

Analysis description: Second Level of Multilevel Model  
GLS analysis

Observations:  18, Predictors:   1
Outcome names: Intercept	Days	
Weighting option: weighted
Inference option: t-test

Other Options:
	Plots: Yes
	Robust: no
	Save: No
	Output file: glmfit_general_output.txt
	R & B - type Empirical Bayes reweighting.
Total time: 0.04 s

________________________________________

Second Level of Multilevel Model 
	Outcome variables:
	Intercept	Days	
Adj. mean	250.61	10.11	

2nd-level B01
	Intercept	Days	
Coeff	250.61	10.11	
STE	7.14	1.55	
t	35.09	6.53	
Z	8.21	4.54	
p	0.00000	0.00001	

________________________________________

function plot_indiv_slopes(indiv_slopes, days)
 
figure; hold on;
m = size(indiv_slopes, 1); % number of subjects
colors = seaborn_colors(m);
 
for i = 1:m
    
    % plot individual line
    plot([0 days], [indiv_slopes(i, 1) indiv_slopes(i, 1) + days * indiv_slopes(i, 2)], 'color', colors{i}, 'LineWidth', 2);
    
end
 
% plot group
% b = lme.Coefficients.Estimate;
% plot([0 days], [b(1) b(1) + days * b(2)], 'color', 'k', 'LineWidth', 4);
 
set(gca, 'FontSize', 18);
xlabel('Days of sleep deprivation'); ylabel('Reaction Time (RT)');
end