Our new pre-print is now available online! đ„ł Here I will summarise our main findings and the impact of our work in the 14-day observational study in RA, known as weaRAble-PRO.
Manucript Abstract
Digital measures of health status captured during daily life could greatly augment current in-clinic assessments for rheumatoid arthritis (RA) which will enable better assessment of disease progression and impact. This work presents results from weaRAble-PRO, a 14-day observational study, which aimed to investigate how digital health technologies (DHT), such as smartphones and wearables, could augment patient reported outcomes (PRO) to identify RA status and severity in a study of 30 moderate-to-severe RA patients, compared to 30 matched healthy controls (HC). Sensor-based measures of health status, mobility, dexterity, fatigue, and other RA specific symptoms were extracted from daily iPhone guided tests (GT), as well as actigraphy and heart rate sensor data, which was passively recorded from patientsâ Apple smartwatch continuously over the study duration. We subsequently developed a machine learning (ML) framework to distinguish RA status and to estimate RA severity. It was found that daily wearable sensor-outcomes robustly distinguished RA from HC participants (F1, 0.807). Furthermore, by day 7 of the study (half-way), a sufficient volume of data had been collected to reliably identify the characteristics of RA participants. In addition, we observed that the detection of RA severity levels could be improved by augmenting standard patient reported outcomes with sensor-based features (F1, 0.833) in comparison to using PRO assessments alone (F1, 0.759) and that the combination of modalities could reliability measure continuous RA severity, as determined by the clinician-assessed RAPID-3 score at baseline (r2, 0.692; RMSE, 1.33). The ability to measure the impact of disease on daily lifeâthrough objective and remote digital outcomes that are meaningful to patientsâpaves the way forward to enable the development of more patient-centric and personalised measurements for use in RA clinical trials.
Note: This article is best viewed in âlight modeâ!
In recent years, the emergence of consumer digital health technologies (DHT) has opened the possibility of developing rich, continuous, and objective measures of rheumatoid arthritis (RA) disease that can be administered remotely outside of standard clinical settings. In this work, we investigated how DHT, in this case a wrist-worn Apple smartwatch device and a bespoke iPhone mobile app., could augment patient reported outcomes (PRO) to characterise the impact of RA on the daily life of 30 moderate-to-severe RA patients, compared to 30 matched healthy controls (HC). This observational study, known as âweaRAble-PROâ (GSK212295) and collected in collaboration with industrial partners GSK plc., demonstrated how smartphone and smartwatch sensor-outcomes could characterise meaningful aspects of RA impairment and physical function impacting daily life.
From these remotely collected wearable sensor-outcomes (such as from the iPhone and Apple smartwatch) we establish how ML can help characterise the impact of RA on daily life. For example, modelling of objective sensor-outcomes could identify RA participants from healthy controlsâwith improved performance when combining the sensor-data from both devicesâand augmented standard patient (self-) reported outcomes to remotely estimate RA severity, as measured by the in-clinic RAPID-3 assessment of RA. To the best of our knowledge, these results offer the first comprehensive evaluation and insight how remote monitoring outcomes in daily life can can characterise RA status and severity, which represents an important first step towards the development of more sensitive and patient-centric measurements for use in RA clinical trials and real-world studies.
We know that rheumatoid arthritis (RA) patients follow quite subtle and unpredictable disease courses, patient-to-patient, and have a progressive
decline in physical function and quality of life and over timeâthis often leads to disability and difficulty to perform many tasks
of daily life (think about going shopping, getting dressed, taking your dog on a walk, or caring for your grandchildren). Currently, the gold-standard methods of measuring the impact of RA on daily life rely on infrequent clinical visits that may often occur every 3â4 months, with assessments depending on a combination of subjective clinician-determined scores
The concept behind our work is to use digital health technologies (DHT) (typically these are consumer-grade mobile apps., smartphones, and wearable devices
Unfortunately, there remains a lack of sufficient evidence for how DHT can provide objective insights into the impact of therapies for RA. Particularly, the benefit of sensor-outcomes generated from prescribed active assessments compared with passive monitoring has not yet been explored together. Furthermore, while digitised PROs enhance patientsâ ability to frequently record disease activity
The GSK weaRAble-PRO study (GSK212295)
Note: this work details a sub-study of weaRAble-PRO; trial design, feasibility, participant adherence, and other primary related study outcomes will be published as part of a complementary manuscript.
The foundation of this work employs the latest advances in machine learning (ML), such as our self-supervised learning (SSL) methodology, which enabled a robust estimate of participantsâ daily activity to be generated over the 14-day study. Building on our previously released work by Yuan et al.
In this study, we build upon our previous work by adding a temporal dependency to the DCNN (SSL) through a hidden markov model (HMM), which
was appended to obtain a more accurate sequence of predicted activities over the continuous study period. It was found that the DCNN (SSL) + HMM improved activity estimation in Capture-24 ($\kappa$, 0.862 ± 0.088; F1, 0.815 ± 0.103) as compared to a baseline random forest (RF) + HMM approach ($\kappa$, 0.813 ± 0.108; F1, 0.775 ± 0.117)
For more information on our SSL model, check out our recent blog post.
Wearable sensor-based features were derived from the smartphone during the active guided tasks and passively from the smartwatch during daily life. âActiveâ features were extracted from smartphone sensor-based measurements during the prescribed guided tests, and aimed to capture specific aspects of RA physical function, related to pain, dexterity, mobility and fatigue. In addition, âpassiveâ features were extracted from smartwatch sensor-based measurements, collected continuously in the background over the 14-day period. Daily activity predictions from the ML SSL model were summarised into general features measuring activity levels, period, duration, and type of activity, as well as sleep detection and sleeping patterns.
From our sensor-based outcomes that we developed, we next explored how state-of-the art machine learning (ML) models could characterise the impact of RA on the daily life of the participants in the 14-day weaRAble-PRO study. Multivariate modelling aimed to explore the ability of active, passive, and PRO measures to: (1) distinguish RA participants from healthy controls (HC) and (2) to estimate RA disease severity: between RA participants with moderate symptoms (RA mod) and severe symptoms (RA sev) as binary classification tasks. Expansions of this analysis subsequently investigated how the in-clinic RAPID-3 assessment, a continuous measure of RA severity, could be estimated from the combination of PRO and sensor-based outcomes.
We first outlined how regularised linear regression (LR) models, with combinations of $\ell_1$ and $\ell_2$ priors, such as LR-lasso ($\ell_1$), LR-ridge ($\ell_2$), and LR-elastic-net ($\ell_1$ +$\ell_2$), could yield predictive, yet sparse model solutions for estimating RA status. Further regularisation extensions were also investigated using the sparse-group lasso (SG-lasso)âan extension of the lasso that promotes both group sparsity and within group parameter-wise ($\ell_2$) sparsity, through a group lasso penalty and the lasso penaltyâwhich aims to yield a sparse set of groups and also a sparse set of covariates in each selected group
In this work, we detailed how raw data collected from smartphone and smartwatch sensors can be transformed into sensor-based outcomes that are reflective of disease status. In concurrence with previous studies, many remotely collected smartphone sensor-outcomes distinguished RA participants and RA severity levels. For example, it was observed that joint ROM features differentiated HC and RA groupsâa similar finding to our previous work
Activity monitoring revealed distinct differences distinguishing RA status, for example the daily percent of the day in moderate-to-vigorous physical activity and similar features, were significantly lower in the RA population compared to healthy controlsâa similar finding by Prioreschi, et al.
Our work is the first study to combine active smartphone and passive wearable measurements to distinguish RA status and measure variations in RA severity. While models trained on only passive features tended to marginally outperform models trained solely on active guided test features, combining both active + passive features led to the best performance in RA identification for all models investigated. Interestingly, it was found that completely different RA subjects were misclassified by active versus passive models.
In addition, further experiments with the LR-SG-lasso determined that only activity monitoring domain features were mainly needed in order to identify RA participants. This indicates that we sometimes do not need to prescribe all guided test assessments, or to parse all activity feature domains, but that a small number of prescribed assessments can be sufficient to characterise RA status. For example, including only the lie-to-stand assessment rather than also prescribing the similar and highly correlated sit-to-stand assessment in future studies; or removing the prescribed walking assessment (shown to have little predictive value in the weaRAble-PRO study) and using passive daily life walking predictions generated from the activity recognition model instead, which could reduce patient burden.
We also observed that after collecting 7 days of sensor-data in the weaRAble-PRO study, a sufficient volume of data had already been recorded to reliably distinguish RA participants from a healthy population; participant feature reliability (as measured ICC values) stabilised at good-to-excellent levels, maximal identification performance of RA participants plateaued, and that there was no additional benefit to averaging over a fortnightâs worth of data versus a week. Therefore it is recommended that considering at least one weekâs worth of sensor data is collected, it might be more beneficial to gather less data from a greater number of participants, rather than greater duration of sensor data from the same participants.
We found that combining patient-reported outcomes (PRO) and objective sensor-outcomes could better capture RAPID-3-based RA severity at baseline than PROs alone; most estimated RAPID-3 scores correctly stratified participants across severity levels from healthy to moderate to severe RA, suggesting that sufficient information to characterise RA disease severity could be reflected in the remote monitoring outcomes derived in the 14-day weaRAble-PRO study. To the best of the authors knowledge, this offers the first evaluation and insight how remote monitoring outcomes in daily life can estimate in-clinic administered assessments of RA impact.
There are a number of limitations that must be considered in the weaRAble-PRO study. Despite rich individual level measurements, the study recruited a relatively small sample size (HC, n=30; RA, n=30). As such, a degree of variability and uncertainty existed in constructing cross-validated models to distinguish RA participants, RA severity levels, or estimate the in-clinic RAPID-3 assessment. There are also limitations associated with modelling a clinician-administered assessment, or clinical labels formulated from in-clinic assessments. A degree of variability and uncertainty existed in modelling the RAPID-3, or RA severity levels, and certainly extrapolation of results aimed at generalising RA is therefore not possible without the availability of larger cohorts and further external validation. Furthermore, the RAPID-3 was assessed at baseline, with participants recalling the prior week, yet the PRO and sensor-based features were calculated as averages over subsequent 14-day trial period from baseline. As such, the baseline RAPID-3 may not have precisely reflected the participantâs disease status recorded earlier, due to the underlying mutability and heterogeneity of RA symptoms over short periods of time. The subjectivity of PRO predictors should also considered, for instance, pain or perceived quality of sleep is relative, and some healthy participants recorded experiencing pain or affected sleep in PRO questionnaires. As a result, some PRO values influenced HC RAPID-3 predictions greater than zero, i.e., indicating the presence of RA symptomsâalbeit non-zero estimated RAPID-3 predictions for HCs were generally low ($<$2).
Our findings in the weaRAble-PRO study demonstrate how digital health technology (DHT) captured sensor-outcomes, recorded from smartphone-based active tests and continuously collected passive smartwatch-based monitoring, could characterise meaningful aspects of rheumatoid arthritis (RA) impairment and physical function impacting daily life. Remotely collected wearable sensor-outcomes could distinguish RA status from healthy controlsâdemonstrating further improved performance when combining the sensor-data from both devicesâand how objective sensor-outcomes could augment patient (self-) reported outcomes to remotely estimate RA severity. Furthermore, by the half-way point of the weaRAble-PRO study (day 7), a sufficient volume of data had already been collected to reliably distinguish the characteristics of RA participants.
The weaRAble-PRO study typifies how continuously collected patient self-reported and sensor-based outcomes may more closely reflect participant perceived and experienced symptoms that impact daily life. While in-clinic assessments are considered the gold-standard means of assessing disease severity in RA, it is clear that remotely collected, continuous, patient-centric measurements generated from PRO and sensorbased outcomes offer promising insights that can undoubtedly augment in-clinic assessments for RA. We believe that our workâthe first comprehensive evaluation how remote sensor data can augment traditional PRO measures to estimate clinician-determined RA severityâhelps informs future DHT study design to better characterise the impact of RA on daily life, ultimately to expand the use of DHT to develop more sensitive, and patient-centric, endpoints in RA clinical trials and real-world studies.
Apple Watch sensor processing was performed using a bespoke version of the biobankAccelerometerAnalysis toolkit, found at: https://github.com/OxWearables/biobankAccelerometerAnalysis.
Deep networks were built using Python v3.7 through a PyTorch v1.7 framework. Our self-supervised learning activity prediction code and trained models are publicly available at: https://github.com/OxWearables/ssl-wearables, including pre-trained models on 100K participants in the UK Biobank.
Some guided test exercises and health metrics calculated are proprietary to Apple ResearchKit http://researchkit.org/ and Apple HealthKit https://developer.apple.com/documentation/healthkit, check these out for more details.
Statistical and machine learning analysis was developed using scikit-learn v1.1.1. Further analysis code can be made available by reaching out to me by email at andrew.creagh@eng.ox.ac.uk.
If you use our work, please consider citing:
@article{creagh2022digital,
title = {Digital health technologies and machine learning augment patient reported outcomes to remotely characterise rheumatoid arthritis},
author = {Creagh, A. P. and Hamy, V. and Yuan, H. and Mertes, G. and Tomlinson, R. and Chen, W-H. and Williams, R. and Llop, C. and Yee, C. and Duh, M-H. and Doherty, A. and Garcia-Gancedo, L. and Clifton, D. A.},
elocation-id = {2022.11.18.22282305},
year = {2022},
doi = {10.1101/2022.11.18.22282305},
publisher = {Cold Spring Harbor Laboratory Press},
html = {https://www.medrxiv.org/content/10.1101/2022.11.18.22282305v1},
journal = {medrXiv},
abbr = {medrXiv},
}