Sample dataset: Back pain clinical trial

We'll load and examine a sample dataset. This dataset is from:
Ashar et al. 2022, JAMA Psychiatry
This study is a randomized clinical trial (RCT) of patients with chronic back pain (>= 4/10 pain for at least 3 months, average duration ~10 years). The trial compares Pain Reprocessing Therapy, Placebo (a single injection into the skin of the back), and Usual Care treatments, with pre-treatment and post-treatment (1 month after treatment initiation), and follow-up to 1 year.
Table of Contents
Prep and Examine the dataset

Dependencies and setup

Do you have the relevant files on your Matlab path?
which Ashar2022_outcomes_demographics.csv
/Users/f003vz1/Dropbox (Dartmouth College)/COURSES/Courses_Dartmouth/2023_Fall_Computational_Foundations/Datasets/Ashar_2022_PRT_backpain/Ashar2022_outcomes_demographics.csv
If you get an error, you don't!
You also need the Statistics and Machine Learning Toolbox in Matlab. Try this to check:
if isempty(ver('stats')), warning('You do not have the statistics toolbox on your Matlab path. You may/will get errors'); end

Load the dataset

We'll read the dataset from a Comma Separated Value (csv) file into a table object. Table is a Matlab class that is very convenient, as it:
(1) stores metadata like table names and descriptions, and
(2) has associated methods for manipulating the dataset (e.g. selecting cases, etc.)
In addition, some statistics functions (like fitlme, for mixed effect models) can take table objects directly, allowing you to reference variables by name
% Sometimes, it may be helpful to change to a directory and add it to your path.
% Then Matlab can find files
 
cd('/Users/f003vz1/Dropbox (Dartmouth College)/COURSES/Courses_Dartmouth/2023_Fall_Computational_Foundations/Datasets/Ashar_2022_PRT_backpain')
addpath(pwd)
backpain_all = readtable('Ashar2022_outcomes_demographics.csv');
 
backpain_all % print (some of) the table to screen
backpain_all = 341×35 table
 idtimegrouppain_avgbpi_intensitybpi_interferenceodipromis_deppromis_angerpromis_anxietypromis_sleeppanas_papanas_napcstsk11sopa_emopgictx_satisfactionalcoholopioidcannabiseducationethnicityhispanicemployment_statusexercisehandednessssesmarried_or_living_as_marriage3132333435
1121322.500011220132417111413178NaNNaN00224013140211151NaN31
2122332.75002.1429122218282510161122815200024013140211151NaN31
3141322.50001.1429268591120511154NaNNaN2303.500034034190661202NaN351
41423321.28572010512112099226130.50002103.500034034190661202NaN351
5151322.25001101098191172216NaNNaN150025013161411851NaN101
6152332.75001.857112978221371194150.5000290025013161411851NaN101
7181254.50003.7143189139281584278NaNNaN00034013151362001NaN51
8182222.25001.5714161181128107628955020034013151362001NaN51
9231232.50001.571410179173019103208NaNNaN2900.01503201516025105221.50001
10232222.25001.428612141017181773198117.500013.500000.02503201516025105221.50001
11241144.50003.14291812161521231015288NaNNaN1101034014130281731231
12242110.50000.857121181615201512219681.50004017.500034014130281731231
13291133.75002.28571812109201985228NaNNaN00034025170231302NaN31
14292100.25000.4286410711161893129610020034025170231302NaN31
15321165.75004.714318101211101885278NaNNaN9014324014371281851NaN41
16322100.25000.142948799187013869311016024014371281851NaN41
17381165.500012616141716161018199NaNNaN350034012181571322NaN151
18382111.7500012107122121701410799.500022.50000034012181571322NaN151
195112441.428616858111956137NaNNaN1003203213025175121.25001
20512253.75000.7143141369157801365804003203213025175121.25001
21611254.50003.714314111212239816247NaNNaN30034013260241542251
22612211.25001.14291610811238812245691.5000200034013260241542251
23671165.2500216889161854138NaNNaN140034013170291202251
24691211.25001.428688108271950114NaNNaN00024114141391751NaN151
25692200.50000.1429285818195111624000024114141391751NaN151
267213444.85713210881617623267NaNNaN292034014171641801211
27722354.50004.57143213108171373330712460134014171641801211
28791344.50002.142920111411151470238NaNNaN1003401218146200220.75001
29792333.25001.7143189992018701486500003401218146200220.75001
308512110.42861886891951178NaNNaN0003403411016416512351
31871244.50003.2857161513211812612239NaNNaN90033012181372502211
32872222.25002.71431215112117785221045790033012181372502211
33961276.25008.142952201927201112152710NaNNaN3800240331413414022201
34962276.75008.2857442520263110161726104603200240331413414022201
351021165.75006.85715215112325121019277NaNNaN1300340341816116011181
3610221333.8571361110202720791886911443340341816116011181
371031243.75003.14293823112528151213248NaNNaN23144240341306410921261
381032232.75003.14293818915281585237353.500024144240341306410921261
3910413221161710331612159247NaNNaN201.500034014171271252241
401042321.75000.5714181611311813139237154.5000201034014171271252241
411081322.25001.1429168810191251215NaNNaN00034012171382102261
4210823220.857118889201263153110.500030034012171382102261
4311211222.7143181815241918178208NaNNaN000340132602922012101
4411221110.285721291814181021110691700340132602922012101
451131175.2500764181522241711242310NaNNaN028034023171591692141
461132143.50003.85715012152223181182410697.5000024034023171591692141
471151143.75002.285716169121816710266NaNNaN2800340111815715722101
481152110.75000.2857418520161874158691.50002800340111815715722101
491171155.25005.142926222412198129198NaNNaN802340131813413022151
5011721444.142922161812281482111051001204340131813413022151
5112512433.28572423719232076167NaNNaN109340241816015222301
521252232.50000.8571221981717228614711.5000000340241816015222301
5314211221.714320101115121878303NaNNaN280153401319147136121.50001
541422100.500004911148226815107100310153401319147136121.50001
5514612332.5714221415182915792210NaNNaN80334013151311282281
561462243.75001.7143241416212612710249135.500070234013151311282281
571471354.7500314888132450149NaNNaN50034013181351602231
5814723221.8571128781323501210678.500070034013181351602231
591481233.50002.8571201415151515811309NaNNaN150142411334131150122.50001
601482222216118111813511248468.5000110282411334131150122.50001
611651243.75002.857118131312131987257NaNNaN130334013230281851221
621721233.75002.42861215142126111014229NaNNaN280024022180211851261
631722255.50002.857116181922325982010134.5000190024022180211851261
641921354.75005.428636813112910620287NaNNaN90034012170241362271
6519223555.428628101012319811257249.5000140034012170241362271
661941333.25002.14291610910211774248NaNNaN1003401328131120220.75001
671942333288911181871168252.50003003401328131120220.75001
681951132.75002.14291611142229251028337NaNNaN15010340242902513022151
69195210000961427206015107100902340242902513022151
702191354.50003.85712818171833151011317NaNNaN95034014181371641215.25001
7122112110.428610888152256193NaNNaN4503340131814421512101
7222122110.857110899162162193371.50005000340131814421512101
732281322.50002.14292010108121975247NaNNaN5200240231415616512201
742282311.25000.857120988111363247677.50003600240231415616512201
752481255.25001.857110111311271082177NaNNaN220024013150222551241
762482254.50001.142912101111291181148323.5000120024013150222551241
772621143.250032024142221131522348NaNNaN50534013130251152231
782622110.75000.571441510141319852010686.500030534013130251152231
792951333.2500218109131716108258NaNNaN11014350241702316022131
802952344.25002.71432010913221610142382304016350241702316022131
813121133.25000.714320101219141896206NaNNaN1101.250024033370252351291
823122110.75000.4286417141912101331110692.50002301424033370252351291
8331911656.428630192014341310103110NaNNaN600340121814823012321
843192100.2500021081216237114107100600340121814823012321
853241165.50006.85714621182633191537329NaNNaN390034011170611901281
863242111.75001.71431815122124127192710692300034011170611901281
873301344.50003.42861486825861157NaNNaN80034014160241701241
8833023221.857110859211160113134.500060034014160241701241
893391344.50001.71433098132015652510NaNNaN300340122702314022101
9034112443.1429161611192514831268NaNNaN170725033120221701241
913412222.50001.285710111413211799157359140725033120221701241
923481333.50004302014252717920289NaNNaN30035013181301202251
9334823221.4286161510172614119151056820035013181301202251
943521365.25006.85712824152324912213110NaNNaN4200340111714615622141
953641345.50000.8571188108222379132NaNNaN100.230034035171241202271
963642321.75000.571414851021176414223002.200034035171241202271
9737811332.4286182519302391412277NaNNaN00024024140342152271
9837821110.714382417261611104241068900024024140342152271
9938913333229142324181011267NaNNaN300340231613417822111
10038923443.1429268915251988296121800340231613417822111

Examine some of the key variables

It's always critical to look at your data! Check that the distributions and values make sense.
We'll work with a few key variables, though there are many more interesting ones to examine!
backpain_all.id; Subject ID
backpain_all.time; Assessment time (1 = pre-treatment, 2 = post)
backpain_all.group; Treatment group (1 = PRT, 2 = Placebo, 3 = Usual Care,
NaN = Healthy (nonpatient, not randomized)
backpain_all.bpi_intensity; Primary outcome, pain intensity
backpain_all.gender; Sex/gender (not diff here; 1 = male, 2 = fem, 3 = other)
logical(backpain_all.is_patient); 1 = patient, 0 = healthy control (not randomized/treated)
How many subjects?
length(unique(backpain_all.id))
ans = 206
How many patients?
is_patient = logical(backpain_all.is_patient); % 1 = patient, 0 = healthy control (not randomized/treated)
length(unique(backpain_all.id(is_patient)))
ans = 152
Calculate descriptive stats for all variables, and view stats for "group"
descriptive_stats = summary(backpain_all);
descriptive_stats.group
ans = struct with fields:
Size: [341 1] Type: 'double' Description: '' Units: '' Continuity: [] Min: 1 Median: 2 Max: 3 NumMissing: 55
How many subjects assessed per group and time point?
% Tabulate two variables
tab = crosstab(backpain_all.group, backpain_all.time);
tab = 3×2
50 44 51 44 50 47
 
% Print the results with labels (and save table object in tab)
tab = table(tab(:, 1), tab(:, 2), 'VariableNames',{'T1' 'T2'}, 'RowNames', {'PRT' 'Placebo' 'Control'})
tab = 3×2 table
 T1T2
1 PRT5044
2 Placebo5144
3 Control5047
 
Create a smaller, reduced dataset with only a few variables we need
This is called backpain_stacked because the dataset is "stacked" in "long form", with multiple rows per participant. This is a repeated measures design, and participants thus have a row for each visit ("Time 1" pre-treatment and "Time 2" post-treatment).
Long-format is convenient for storing a dataset in a single frame and for mixed effects models, which keep track of which observations are collected on the same participants (and other dependencies) internally.
This has patients only, because the controls have NaN values for group.
backpain_stacked = backpain_all(:, {'id' 'time' 'group' 'bpi_intensity' 'gender' 'is_patient'})
backpain_stacked = 341×6 table
 idtimegroupbpi_intensitygenderis_patient
112132.500011
212232.750011
314132.500021
41423221
515132.250011
615232.750011
718124.500011
818222.250011
923122.500021
1023222.250021
1124114.500011
1224210.500011
1329113.750021
1429210.250021
1532115.750011
1632210.250011
1738115.500021
1838211.750021
195112411
2051223.750011
2161124.500021
2261221.250021
2367115.250021
2469121.250011
2569220.500011
267213411
2772234.500011
2879134.500021
2979233.250021
308512111
3187124.500021
3287222.250021
3396126.250021
3496226.750021
35102115.750011
3610221311
37103123.750021
38103222.750021
3910413221
40104231.750021
41108132.250021
4210823221
4311211211
4411221111
45113115.250021
46113213.500021
47115113.750021
48115210.750021
49117115.250021
5011721421
5112512321
52125222.500021
5314211211
54142210.500011
5514612321
56146223.750021
57147134.750021
5814723221
59148123.500011
6014822211
61165123.750011
62172123.750011
63172225.500011
64192134.750021
6519223521
66194133.250021
6719423321
68195112.750021
6919521021
70219134.500011
7122112111
7222122111
73228132.500011
74228231.250011
75248125.250011
76248224.500011
77262113.250021
78262210.750021
79295133.250021
80295234.250021
81312113.250011
82312210.750011
8331911511
84319210.250011
85324115.500011
86324211.750011
87330134.500011
8833023211
89339134.500021
9034112411
91341222.500011
92348133.500021
9334823221
94352135.250021
95364135.500021
96364231.750021
9737811321
9837821121
9938913321
10038923421
Unstack the dataset to match outcome values within-person
For standard single-level GLMs and repeated measures models (paired t-tests, repeated measures ANOVA), the "long format" dataset is inappropriate. We want a "wide format" dataset, with one row per person, and repeated measures from the person in different variables (columns).
backpain = unstack(backpain_stacked,'bpi_intensity','time', 'NewDataVariableNames', {'BPI_Time1', 'BPI_Time2'})
backpain = 151×6 table
 idgroupgenderis_patientBPI_Time1BPI_Time2
1123112.50002.7500
2143212.50002
3153112.25002.7500
4182114.50002.2500
5232212.50002.2500
6241114.50000.5000
7291213.75000.2500
8321115.75000.2500
9381215.50001.7500
105121143.7500
11612214.50001.2500
12671215.2500NaN
13692111.25000.5000
147231144.5000
15793214.50003.2500
16852111NaN
17872214.50002.2500
18962216.25006.7500
191021115.75003
201032213.75002.7500
2110432121.7500
221083212.25002
2311211121
241131215.25003.5000
251151213.75000.7500
261171215.25004
2712522132.5000
2814211120.5000
2914622133.7500
301473214.75002
311482113.50002
321652113.7500NaN
331722113.75005.5000
341923214.75005
351943213.25003
361951212.75000
372193114.5000NaN
3822121111
392283112.50001.2500
402482115.25004.5000
412621213.25000.7500
422953213.25004.2500
433121113.25000.7500
4431911150.2500
453241115.50001.7500
463303114.50002
473393214.5000NaN
4834121142.5000
493483213.50002
503523215.2500NaN
513643215.50001.7500
5237812131
5338932134
544162215.50003
5543131144.2500
564342214NaN
574501213NaN
5846032177.2500
594642211.5000NaN
604712113.25001.7500
614771115.25002.5000
6250131143.2500
635351214.50000.2500
645441213NaN
6554522131.7500
6657512130.7500
6757922121.2500
685863214.25002.7500
695981213.25000
7059931144
716022215.7500NaN
726041112.50003.2500
736051215.25000.2500
7460722164.7500
756083212.50001.7500
766512111.5000NaN
776642214NaN
786662112.25001.7500
796721214.75002.2500
806761113.5000NaN
816932115.50005
826983113.50002
836991113.50001.2500
8470331131.2500
857041213.50003
8671121133
877133213.75002.2500
887433212.25002
897592115.75004
907713213.75005
9177411151
9278231131
937913113.25003.5000
9479212140.5000
957963214.50002.7500
968141115.50000.2500
9781812141.7500
988321112.2500NaN
998371215.75000
10084321121
Now, rows correspond to participants, and columns correspond to measured variables.
Missing data are coded with NaN (Not a Number), so participants who are missing a time point will have NaN values for that time point.
Let's add a column for the change in pain (improvement), where negative values mean more improvement:
backpain.BPI_change = backpain.BPI_Time2 - backpain.BPI_Time1;
 
Create sub-tables with only participants in each group
We do this for convenience, so we can extract data and plot it easily.
prt = backpain(backpain.group == 1, :);
placebo = backpain(backpain.group == 2, :);
control = backpain(backpain.group == 3, :);
 
Save simple tables
We can save simple versions of the tables for later, so that we can download and work with them in any software package.
writetable(backpain, 'Ashar_backpain_simple_wide.csv');
writetable(backpain_stacked, 'Ashar_backpain_simple_long.csv');

Visualize the data

Let's create some plots of the improvement in BPI (Brief Pain Inventory) by group
 
data_to_plot = {prt.BPI_change, placebo.BPI_change, control.BPI_change};
colors = seaborn_colors(3);
 
create_figure('Violins');
barplot_columns(data_to_plot, 'Improvement by group', 'colors', colors, 'names', {'PRT', 'Placebo' 'Control'});
Col 1: PRT Col 2: Placebo Col 3: Control --------------------------------------------- Tests of column means against zero --------------------------------------------- Name Mean_Value Std_Error T P Cohens_d ___________ __________ _________ _______ __________ ________ {'PRT' } -2.6648 0.21657 -12.304 2.2204e-15 -1.8549 {'Placebo'} -0.89773 0.18523 -4.8466 1.6705e-05 -0.73065 {'Control'} -0.51596 0.1531 -3.3701 0.0015294 -0.49158
ylabel('BPI change, T2-T1')
xlabel('Group');
 
 
create_figure('Lines');
line1 = lineplot_columns([prt.BPI_Time1 prt.BPI_Time2], 'w', 3, 'color', colors{1}, 'markerfacecolor', colors{1}./1.5);
line2 = lineplot_columns([placebo.BPI_Time1 placebo.BPI_Time2], 'w', 3, 'color', colors{2}, 'markerfacecolor', colors{2}./1.5);
line3 = lineplot_columns([control.BPI_Time1 control.BPI_Time2], 'w', 3, 'color', colors{3}, 'markerfacecolor', colors{3}./1.5);
ylabel('BPI change, T2-T1')
set(gca, 'XLim', [0.5 2.5], 'XTick', 1:2, 'XTickLabel', {'Time 1' 'Time 2'})
 

Code hygiene

To keep the code clean, here are some things I attempted to do:
To keep the code clean, here are some things I did NOT do: