CP 13: Enabling Robust Adaptation in mHealth Interventions

Collaborating Investigator:

Dr. Predrag Klasnja, University of Michigan

Funding Status:

2R01HL125440

NIH/NHLBI

12/01/14 – 12/31/28

Associated with:

TR&D1, TR&D2 , TR&D3

Significance
Approach & Push Pull Relationship
Publications

Abstract Cardiac Rehabilitation (CR) programs effectively improve cardiovascular outcomes by helping patients adopt heart-healthy lifestyles. However, up to 50% of patients stop exercising or following a healthy diet within months after completing CR, causing many gains to be lost. mHealth offers promising ongoing support after CR, helping patients sustain and enhance their health improvements. Yet, to fulfill this potential, new approaches are needed to boost engagement in mHealth interventions. Patients often abandon these programs due to discouragement from unmet goals, frustration with poorly timed or excessive messaging, or competing life demands. Once abandoned, mHealth interventions lose their ability to support patients further. This project addresses this problem by developing novel algorithms and user interfaces – components that let users interact with the mHealth app – that enable interventions to adapt effectively to changing individual needs and priorities, ensuring sustained usefulness and support over time. The methods will (1) improve intervention delivery by monitoring and optimizing both intermediate behavioral outcomes like physical activity commitment and user engagement, and (2) empower users to adjust intervention behavior by reviewing and correcting the data guiding decision-making, and by specifying support levels that better fit their current circumstances. After developing and optimizing these innovations in a micro-randomized trial with 60 CR patients, we will evaluate their impact on engagement and physical activity in a 9-month randomized controlled trial with 150 CR patients. This trial will compare an intervention with the new adaptive algorithms and interfaces to one with similar behavior change components but without these adaptive features. If successful, this work will enable a new generation of mHealth interventions capable of effectively supporting sustained behavior change for cardiovascular risk reduction and other health domains.

The collaboration creates scalable, adaptive mHealth tools that improve sustained behavior change not just in cardiovascular health but across multiple health domains, making interventions more personalized, effective, and impactful in long term chronic care support.

This collaborator, Klasnja, and his previous CP played a central role in TR&D2’s development of adaptive intervention methods, particularly in analyzing MRT data and advancing RL algorithms. Early work together produced several papers on binary outcome analysis, flexible modeling, and off-policy RL, with multiple pieces highlighted by journal editors. Weekly meetings between investigators and students from both teams fostered deep integration across behavioral theory, causal modeling, and algorithm design. Notably, the CP co-led the development of the pJITAI toolbox, now in user testing, and collaborated on new hierarchical Thompson sampling algorithms and resampling methods to evaluate RL performance. Joint work has continued with ongoing publications and tool deployment efforts alongside other CPs.

TR&D2 is collaborating with CP13 on the construction of the domain science-informed causal directed acyclic graphs (DAGs) and their use for RL algorithm development. This collaboration i impacts data collection in CP13’s upcoming micro-randomized trial (with attention to obtaining the needed variables on the DAG). Further TR&D2 is collaborating on guardrails on the frequency that burdensome queries for information can be made, and TR&D1 Aim 1 is collaborating on a risk score for physical inactivity.

Most recently CP13 has motivated TR&D2 to consider how DAGs can be used to speed up learning by an RL algorithm (TR&D2 Aim 1 in this renewal). Further CP13 is motivating TR&D2 to consider how best to learn the effects of RL “actions” that include burdensome queries for information. An early-stage theoretical paper has resulted due to the former motivation and a second early-stage theoretical paper is underway due the latter motivation. CP13 has challenged TR&D1 Aim 3 to develop methods for synthesizing audio interventions for delivery via a novel hearable platform, a smart ear bud capable of audio messaging.

The collaboration creates scalable, adaptive mHealth tools that improve sustained behavior change in cardiovascular health and other health domains, making interventions more personalized, effective, and impactful in long term chronic care support, and supports the use of emerging hearable platforms.

TR&D1: Discovery

Heteroscedastic Temporal Variational Autoencoder for Irregularly Sampled Time Series

Authors:

Satya Narayan Shukla, Benjamin Marlin

Publication Venue:

International Conference on Learning Representations (ICLR)

Publication Date:

January 28, 2022

Keywords:

irregular sampling, uncertainty, imputation, interpolation, multivariate time series, missing data, variational autoencoder

View Full Paper

Related Project:

CP 3

In order to model and represent uncertainty in mHealth biomarkers to account for multifaceted uncertainty during momentary decision making in selecting, adapting, and delivering temporally-precise mHealth interventions. In this period, we extended our previous deep learning approach, Multi-Time Attention Networks, to enable improved representation of output uncertainty. Our new approach preserves the idea of learned temporal similarity functions and adds heteroskedastic output uncertainty. The new framework is referred to as the Heteroskedastic Variational Autoencoder and models real-valued multivariate data.

Abstract:

Irregularly sampled time series commonly occur in several domains where they present a significant challenge to standard deep learning models. In this paper, we propose a new deep learning framework for probabilistic interpolation of irregularly sampled time series that we call the Heteroscedastic Temporal Variational Autoencoder (HeTVAE). HeTVAE includes a novel input layer to encode information about input observation sparsity, a temporal VAE architecture to propagate uncertainty due to input sparsity, and a heteroscedastic output layer to enable variable uncertainty in output interpolations. Our results show that the proposed architecture is better able to reflect variable uncertainty through time due to sparse and irregular sampling than a range of baseline and traditional models, as well as recently proposed deep latent variable models that use homoscedastic output layers.

TL;DR:

We present a new deep learning architecture for probabilistic interpolation of irregularly sampled time series.

BayesLDM: A Domain-Specific Language for Probabilistic Modeling of Longitudinal Data

Authors:

Karine Tung, Steven De La Torre, Mohamed El Mistiri, Rebecca Braga De Braganca, Eric Hekler, Misha Pavel, Daniel Rivera, Pedja Klasnja, Donna Spruijt-Metz, Benjamin M. Marlin

Publication Venue:

IEEE/ACM international conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE)

Publication Date:

September 12, 2022

Keywords:

Bayesian inference, probabilistic programming, time series, missing data, Bayesian imputation, mobile health

View Full Paper

Related Projects:

CP 3, CP 4

We have developed a toolbox for the specification and estimation of mechanistic models in the dynamic bayesian network family. This toolbox focuses on making it easier to specify probabilistic dynamical models for time series data and to perform Bayesian inference and imputation in the specified model given incomplete data as input. The toolbox is referred to as BayesLDM. We have been working with members of CP3, CP4, and TR&D2 to develop offline data analysis and simulation models using this toolbox. We are also currently in discussions with members of CP4 to deploy the toolbox’s Bayesian imputation methods within a live controller optimization trial in the context of an adaptive walking intervention.

Abstract:

In this paper we present BayesLDM, a system for Bayesian longitudinal data modeling consisting of a high-level modeling language with specific features for modeling complex multivariate time series data coupled with a compiler that can produce optimized probabilistic program code for performing inference in the specified model. BayesLDM supports modeling of Bayesian network models with a specific focus on the efficient, declarative specification of dynamic Bayesian Networks (DBNs). The BayesLDM compiler combines a model specification with inspection of available data and outputs code for performing Bayesian inference for unknown model parameters while simultaneously handling missing data. These capabilities have the potential to significantly accelerate iterative modeling workflows in domains that involve the analysis of complex longitudinal data by abstracting away the process of producing computationally efficient probabilistic inference code. We describe the BayesLDM system components, evaluate the efficiency of representation and inference optimizations and provide an illustrative example of the application of the system to analyzing heterogeneous and partially observed mobile health data.

TL;DR:

We present a a toolbox for the specification and estimation of mechanistic models in the dynamic bayesian network family.

Assessing the Impact of Context Inference Error & Partial Observability on RL Methods for Just-In-Time Adaptive Interventions

Authors:

Karine Tung, Pedja Klasnja, Susan Murphy, Benjamin M. Marlin

Publication Venue:

Conference on Uncertainty in Artificial Intelligence (UAI 2023)

Publication Date:

May 17, 2023

Keywords:

reinforcement learning, partial observability, context inference, adaptive interventions, empirical evaluation, mobile health

View Full Paper

Related Project:

CP 3

Abstract:

Just-in-Time Adaptive Interventions (JITAIs) are a class of personalized health interventions developed within the behavioral science community. JITAIs aim to provide the right type and amount of support by iteratively selecting a sequence of intervention options from a pre-defined set of components in response to each individual’s time varying state. In this work, we explore the application of reinforcement learning methods to the problem of learning intervention option selection policies. We study the effect of context inference error and partial observability on the ability to learn effective policies. Our results show that the propagation of uncertainty from context inferences is critical to improving intervention efficacy as context uncertainty increases, while policy gradient algorithms can provide remarkable robustness to partially observed behavioral state information.

TL;DR:

This work focuses on JITAIs, personalized health interventions that dynamically select support components based on an individual’s changing state. The study applies reinforcement learning methods to learn policies for selecting intervention options, revealing that uncertainty from context inferences is crucial for enhancing intervention efficacy as context uncertainty increases.

TR&D2: Optimization

Counterfactual Inference for Sequential Experimental Design

Authors:

Raaz Dwivedi, Katherine Tian, Sabina Tomkins, Predrag Klasnja, Susan Murphy, Devavrat Shah.

Publication Venue:

arXiv: 2202.06891

Publication Date:

February 14, 2022

Keywords:

sequential experiments, counterfactual inference, adaptive randomization, non-linear factor model, mixed effects model, nearest neighbors.

View Full Paper

Related Projects:

CP 3

Abstract:

We consider after-study statistical inference for sequentially designed experiments wherein multiple units are assigned treatments for multiple time points using treatment policies that adapt over time. Our goal is to provide inference guarantees for the counterfactual mean at the smallest possible scale – mean outcome under different treatments for each unit and each time – with minimal assumptions on the adaptive treatment policy. Without any structural assumptions on the counterfactual means, this challenging task is infeasible due to more unknowns than observed data points. To make progress, we introduce a latent factor model over the counterfactual means that serves as a non-parametric generalization of the non-linear mixed effects model and the bilinear latent factor model considered in prior works. For estimation, we use a non-parametric method, namely a variant of nearest neighbors, and establish a non-asymptotic high probability error bound for the counterfactual mean for each unit and each time. Under regularity conditions, this bound leads to asymptotically valid confidence intervals for the counterfactual mean as the number of units and time points grows to together at suitable rates. We illustrate our theory via several simulations and a case study involving data from a mobile health clinical trial HeartSteps.

TL;DR:

This publication presents a pioneering statistical inference method for sequential experiments where treatments are adaptively assigned to multiple units over time. It addresses the complex challenge of estimating unit-by-time level counterfactual outcomes by introducing a non-linear latent factor model and employing a nearest neighbors approach. The method provides non-asymptotic error bounds and asymptotically valid confidence intervals for counterfactual means, even with adaptive policies that pool data across units, making it highly applicable to real-world scenarios such as mobile health clinical trials like HeartSteps, online education, and personalized recommendations.

Harnessing Causality in Reinforcement Learning With Bagged Decision Times

Authors:

Daiqi Gao, Hsin-Yu Lai, Predrag Klasnja, Susan A. Murphy

Publication Venue:

arXiv: 2410.14659

Publication Date:

October 18, 2024

Keywords:

View Full Paper

Reinforcement Learning (RL), Bagged Decision Times, Causal Directed Acyclic Graph (DAG), Non-Markovian, Non-stationary, Periodic Markov Decision Process (MDP), Dynamical Bayesian Sufficient Statistic (D-BaSS), Mobile Health (mHealth), HeartSteps, Online RL, Bellman Equations, Randomized Least-Squares Value Iteration (RLSVI), State Construction, Mediators.

Related Projects:

CP 3

Abstract:

We consider reinforcement learning (RL) for a class of problems with bagged decision times. A bag contains a finite sequence of consecutive decision times. The transition dynamics are non-Markovian and non-stationary within a bag. All actions within a bag jointly impact a single reward, observed at the end of the bag. For example, in mobile health, multiple activity suggestions in a day collectively affect a user’s daily commitment to being active. Our goal is to develop an online RL algorithm to maximize the discounted sum of the bag-specific rewards. To handle non-Markovian transitions within a bag, we utilize an expert-provided causal directed acyclic graph (DAG). Based on the DAG, we construct states as a dynamical Bayesian sufficient statistic of the observed history, which results in Markov state transitions within and across bags. We then formulate this problem as a periodic Markov decision process (MDP) that allows non-stationarity within a period. An online RL algorithm based on Bellman equations for stationary MDPs is generalized to handle periodic MDPs. We show that our constructed state achieves the maximal optimal value function among all state constructions for a periodic MDP. Finally, we evaluate the proposed method on testbed variants built from real data in a mobile health clinical trial.

TL;DR:

This publication introduces a novel Reinforcement Learning (RL) framework designed for scenarios with “bagged decision times,” where a sequence of actions within a finite period (a “bag”) jointly influences a single reward observed at the end of that period. Unlike traditional RL that often assumes Markovian and stationary transitions, this work addresses non-Markovian and non-stationary dynamics within a bag. The core innovation lies in leveraging an expert-provided causal Directed Acyclic Graph (DAG) to construct states (specifically, a dynamical Bayesian sufficient statistic) that ensure Markovian transitions both within and across bags. This allows the problem to be formulated as a periodic Markov Decision Process (MDP), generalizing existing RL algorithms based on Bellman equations. The proposed online RL algorithm, Bagged RLSVI, is shown to achieve maximal optimal value functions and is evaluated effectively on testbed variants built from real mobile health (mHealth) data (HeartSteps clinical trial), demonstrating its robustness even with misspecified causal assumptions. The research highlights the significant role of mediators in improving the optimal value function.

A Randomized Trial of a Mobile Health Intervention to Augment Cardiac Rehabilitation: The Virtual Application-Supported Environment to INcrease Exercise (VALENTINE) Study

Authors:

Jessica R. Golbus, Kashvi Gupta, Rachel Stevens, V. Swetha E. Jeganathan, Evan Luff, Jieru Shi, Walter Dempsey, Thomas Boyden, Bhramar Mukherjee, Sarah Kohnstamm, Vlad Taralunga, Vik Kheterpal, Susan Murphy, Predrag Klasnja, Sachin Kheterpal, Brahmajee K. Nallamothu

Publication Venue:

Nature
NPJ Digital Medicine

Publication Date:

September 14, 2023

Keywords:

mHealth, Cardiac rehabilitation (CR), Randomized clinical trial, Just-in-time adaptive intervention (JITAI), Smartwatch, Smartphone, Physical activity, 6-minute walk test, Cardiovascular disease, Text messages, Mobile application.

View Full Paper

Related Projects:

CP 3

Abstract:

Mobile health (mHealth) interventions may enhance positive health behaviors, but randomized trials evaluating their efficacy are uncommon. Our goal was to determine if a mHealth intervention augmented and extended benefits of center-based cardiac rehabilitation (CR) for physical activity levels at 6-months. We delivered a randomized clinical trial to low and moderate risk patients with a compatible smartphone enrolled in CR at two health systems. All participants received a compatible smartwatch and usual CR care. Intervention participants received a mHealth intervention that included a just-in-time-adaptive intervention (JITAI) as text messages. The primary outcome was change in remote 6-minute walk distance at 6-months stratified by device type. Here we report the results for 220 participants enrolled in the study (mean [SD]: age 59.6 [10.6] years; 67 [30.5%] women). For our primary outcome at 6 months, there is no significant difference in the change in 6 min walk distance across smartwatch types (Intervention versus control: +31.1 meters Apple Watch, −7.4 meters Fitbit; p = 0.28). Secondary outcomes show no difference in mean step counts between the first and final weeks of the study, but a change in 6 min walk distance at 3 months for Fitbit users. Amongst patients enrolled in center-based CR, a mHealth intervention did not improve 6-month outcomes but suggested differences at 3 months in some users.

TL;DR:

This randomized clinical trial, named the VALENTINE Study, investigated whether a mobile health (mHealth) intervention—consisting of a smartphone application and contextually tailored text messages delivered via smartwatches—could augment and extend the benefits of center-based cardiac rehabilitation (CR) by improving physical activity levels over 6 months. The study enrolled 220 low and moderate risk CR patients who were provided with either an Apple Watch or Fitbit. The primary outcome, change in remote 6-minute walk distance at 6-months, showed no significant difference between the intervention and control groups. While there were suggestive differences at 3 months for Fitbit users regarding 6-minute walk distance, the intervention did not achieve its goal of sustained long-term impact on physical activity.

Estimating Time-Varying Causal Excursion Effects in Mobile Health With Binary Outcomes With Discussion

Authors:

Tianchen Qian, Hyesun Yoo, Predrag Klasnja, Daniel Almirall, Susan A. Murphy

Publication Venue:

Biometrika

Keywords:

binary outcome, causal excursion effect, causal inference, longitudinal data, micro-randomized trials, mobile health, relative risk, semiparametric efficiency theory

Publication Date:

September 2021

View Full Paper

Abstract:

Advances in wearables and digital technology now make it possible to deliver behavioral mobile health interventions to individuals in their everyday life. The micro-randomized trial (MRT) is increasingly used to provide data to inform the construction of these interventions. In an MRT, each individual is repeatedly randomized among multiple intervention options, often hundreds or even thousands of times, over the course of the trial. This work is motivated by multiple MRTs that have been conducted, or are currently in the field, in which the primary outcome is a longitudinal binary outcome. The primary aim of such MRTs is to examine whether a particular time-varying intervention has an effect on the longitudinal binary outcome, often marginally over all but a small subset of the individual’s data. We propose the definition of causal excursion effect that can be used in such primary aim analysis for MRTs with binary outcomes. Under rather restrictive assumptions one can, based on existing literature, derive a semiparametric, locally efficient estimator of the causal effect. We, starting from this estimator, develop an estimator that can be used as the basis of a primary aim analysis under more plausible assumptions. Simulation studies are conducted to compare the estimators. We illustrate the developed methods using data from the MRT, BariFit. In BariFit, the goal is to support weight maintenance for individuals who received bariatric surgery.

TL;DR:

We develop an estimator that can be used as the basis of a primary aim analysis under more plausible assumptions. Simulation studies are conducted to compare the estimators. We illustrate the developed methods using data from the MRT, BariFit. In BariFit, the goal is to support weight maintenance for individuals who received bariatric surgery.

Online Model Selection by Learning How Compositional Kernels Evolve

Authors:

Eura Shin, Predrag Klasnja, Susan A. Murphy, Finale Doshi-Velez

Publication Venue:

Transactions on Machine Learning Research

Publication Date:

November 2023

Keywords:

Online model selection, compositional kernels, Gaussian Process regression, multi-task learning, mobile health (mHealth), Kernel Evolution Model (KEM), personalized learning, bias-variance trade-off, sparsity, stability, adaptive complexity, Dirichlet Process, Chinese Restaurant Process, kernel evolution.

View Full Paper

Related Projects:

CP 3

Abstract:

Motivated by the need for efficient, personalized learning in mobile health, we investigate the problem of online compositional kernel selection for multi-task Gaussian Process regression. Existing composition selection methods do not satisfy our strict criteria in health; selection must occur quickly, and the selected kernels must maintain the appropriate level of complexity, sparsity, and stability as data arrives online. We introduce the Kernel Evolution Model (KEM), a generative process on how to evolve kernel compositions in a way that manages the bias-variance trade-off as we observe more data about a user. Using pilot data, we learn a set of kernel evolutions that can be used to quickly select kernels for new test users. KEM reliably selects high-performing kernels for a range of synthetic and real data sets, including two health data sets.

TL;DR:

This publication introduces the Kernel Evolution Model (KEM), an innovative approach for online compositional kernel selection in multi-task Gaussian Process regression, specifically designed for mobile health (mHealth) applications. KEM addresses the critical need for efficient, personalized learning by training on pilot data to learn how kernel compositions should evolve over time. This allows KEM to quickly select high-performing kernels for new users that are sparse, stable, and possess adaptive complexity, effectively managing the bias-variance trade-off, especially in low-data scenarios.

IntelligentPooling: Practical Thompson Sampling for mHealth

Authors:

Sabina Tomkins, Peng Liao, Predrag Klasnja, Susan Murphy

Publication Venue:

Machine Learning, Volume 110, Pages 2685–2727

Keywords:

Thompson-Sampling, mobile health, reinforcement learning

Publication Date:

September 2021

View Full Paper

Abstract:

In mobile health (mHealth) smart devices deliver behavioral treatments repeatedly over time to a user with the goal of helping the user adopt and maintain healthy behaviors. Reinforcement learning appears ideal for learning how to optimally make these sequential treatment decisions. However, significant challenges must be overcome before reinforcement learning can be effectively deployed in a mobile healthcare setting. In this work we are concerned with the following challenges: 1) individuals who are in the same context can exhibit differential response to treatments 2) only a limited amount of data is available for learning on any one individual, and 3) non-stationary responses to treatment. To address these challenges we generalize Thompson-Sampling bandit algorithms to develop IntelligentPooling. IntelligentPooling learns personalized treatment policies thus addressing challenge one. To address the second challenge, IntelligentPooling updates each user’s degree of personalization while making use of available data on other users to speed up learning. Lastly, IntelligentPooling allows responsivity to vary as a function of a user’s time since beginning treatment, thus addressing challenge three. We show that IntelligentPooling achieves an average of 26% lower regret than state-of-the-art. We demonstrate the promise of this approach and its ability to learn from even a small group of users in a live clinical trial.

TL;DR:

To address the significant challenges that must be overcome before reinforcement learning can be deployed in a mobile healthcare setting, we develop IntelligentPooling by generalizing Thompson-Sampling bandit algorithms.

Effect-Invariant Mechanisms for Policy Generalization

Authors:

Sorawit Saengkyongam, Niklas Pfister, Predrag Klasnja, Susan Murphy, Jonas Peters

Publication Venue:

arXiv:2306.10983

Publication Date:

June 27, 2023

Keywords:

effect-invariant mechanisms, policy generalization, machine learning

View Full Paper

Related Projects:

Cp 3

Abstract:

Policy learning is an important component of many real-world learning systems. A major challenge in policy learning is how to adapt efficiently to unseen environments or tasks. Recently, it has been suggested to exploit invariant conditional distributions to learn models that generalize better to unseen environments. However, assuming invariance of entire conditional distributions (which we call full invariance) may be too strong of an assumption in practice. In this paper, we introduce a relaxation of full invariance called effect-invariance (e-invariance for short) and prove that it is sufficient, under suitable assumptions, for zero-shot policy generalization. We also discuss an extension that exploits e-invariance when we have a small sample from the test environment, enabling few-shot policy generalization. Our work does not assume an underlying causal graph or that the data are generated by a structural causal model; instead, we develop testing procedures to test e-invariance directly from data. We present empirical results using simulated data and a mobile health intervention dataset to demonstrate the effectiveness of our approach.

TL;DR:

In this paper, we introduce a relaxation of full invariance called effect-invariance (e-invariance for short) and prove that it is sufficient, under suitable assumptions, for zero-shot policy generalization. We also discuss an extension that exploits e-invariance when we have a small sample from the test environment, enabling few-shot policy generalization.

Batch Policy Learning in Average Reward Markov Decision Processes

Authors:

Peng Liao, Zhengling Qi, Runzhe Wan, Predrag Klasnja, Susan Murphy

Publication Venue:

The Annals of Statistics

Publication Date:

December 21, 2022

Keywords:

average reward, doubly robust estimator, Markov Decision Process, policy optimization

View Full Paper

Related Project:

CP 3

Abstract:

We consider the batch (off-line) policy learning problem in the infinite horizon Markov decision process. Motivated by mobile health applications, we focus on learning a policy that maximizes the long-term average reward. We propose a doubly robust estimator for the average reward and show that it achieves semiparametric efficiency. Further, we develop an optimization algorithm to compute the optimal policy in a parameterized stochastic policy class. The performance of the estimated policy is measured by the difference between the optimal average reward in the policy class and the average reward of the estimated policy and we establish a finite-sample regret guarantee. The performance of the method is illustrated by simulation studies and an analysis of a mobile health study promoting physical activity.

TL;DR:

We consider batch policy learning in an infinite horizon Markov Decision Process, focusing on optimizing a policy for long-term average reward in the context of mobile health applications.

Data-driven Interpretable Policy Construction for Personalized Mobile Health

Authors:

Dimitris Bertsimas, Predrag Klasnja, Susan Murphy, Liangyuan Na, Susan A. Murphy

Publication Venue:

IEEE International Conference on Digital Health (ICDH)

Publication Date:

July 10, 2022

Keywords:

learning systems, optimized production technology, behavioral sciences, electronic healthcare, decision trees

Access Full Paper

Related Project:

CP 3

Abstract:

To promote healthy behaviors, many mobile health applications provide message-based interventions, such as tips, motivational messages, or suggestions for healthy activities. Ideally, the intervention policies should be carefully designed so that users obtain the benefits without being overwhelmed by overly frequent messages. As part of the HeartSteps physical-activity intervention, users receive messages intended to disrupt sedentary behavior. HeartSteps uses an algorithm to uniformly spread out the daily message budget over time, but does not attempt to maximize treatment effects. This limitation motivates constructing a policy to optimize the message delivery decisions for more effective treatments. Moreover, the learned policy needs to be interpretable to enable behavioral scientists to examine it and to inform future theorizing. We address this problem by learning an effective and interpretable policy that reduces sedentary behavior. We propose Optimal Policy Trees + (OPT+), an innovative batch off-policy learning method, that combines a personalized threshold learning and an extension of Optimal Policy Trees under a budget-constrained setting. We implement and test the method using data collected in HeartSteps V2N3. Computational results demonstrate a significant reduction in sedentary behavior with a lower delivery budget. OPT + produces a highly interpretable and stable output decision tree thus enabling theoretical insights to guide future research.

TL;DR:

Online RL faces challenges like real-time stability and handling complex, unpredictable environments; to address these issues, the PCS framework originally used in supervised learning is extended to guide the design of RL algorithms for such settings, including guidelines for creating simulation environments, as exemplified in the development of an RL algorithm for the mobile health study Oralytics aimed at enhancing tooth-brushing behaviors through personalized intervention messages.

Did We Personalize? Assessing Personalization by an Online Reinforcement Learning Algorithm Using Resampling

Authors:

Susobhan Ghosh, Raphael Kim, Prasidh Chhabria, Raaz Dwivedi, Predrag Klasnja, Peng Liao, Kelly Zhang, Susan Murphy

Publication Venue:

arXiv:2304.05365v6

Publication Date:

August 7, 2023

Keywords:

reinforcement learning, personalization, resampling, exploratory data analysis, mobile health

View Full Paper

Related Project:

CP 3

Abstract:

There is a growing interest in using reinforcement learning (RL) to personalize sequences of treatments in digital health to support users in adopting healthier behaviors. Such sequential decision-making problems involve decisions about when to treat and how to treat based on the user’s context (e.g., prior activity level, location, etc.). Online RL is a promising data-driven approach for this problem as it learns based on each user’s historical responses and uses that knowledge to personalize these decisions. However, to decide whether the RL algorithm should be included in an “optimized” intervention for real-world deployment, we must assess the data evidence indicating that the RL algorithm is actually personalizing the treatments to its users. Due to the stochasticity in the RL algorithm, one may get a false impression that it is learning in certain states and using this learning to provide specific treatments. We use a working definition of personalization and introduce a resampling-based methodology for investigating whether the personalization exhibited by the RL algorithm is an artifact of the RL algorithm stochasticity. We illustrate our methodology with a case study by analyzing the data from a physical activity clinical trial called HeartSteps, which included the use of an online RL algorithm. We demonstrate how our approach enhances data-driven truth-in-advertising of algorithm personalization both across all users as well as within specific users in the study.

TL;DR:

We use a working definition of personalization and introduce a resampling-based methodology for investigating whether the personalization exhibited by the RL algorithm is an artifact of the RL algorithm stochasticity.

The Microrandomized Trial for Developing Digital Interventions: Experimental Design and Data Analysis Considerations

Authors:

Tianchen Qian, Ashley Walton, Linda Collins, Predrag Klasnja, Stephanie Lanza, Inbal Nahum-Shani, Mashfiqui Rabbi, Michael Russell, Maureen Walton, Hyesun Yoo, Susan Murphy

Publication Venue:

Psychological Methods

Publication Date:

January 13, 2022

Keywords:

Micro-randomized trial (MRT), health behavior change, digital intervention, just-in-time adaptive intervention (JITAI), causal inference, intensive longitudinal data

View Full Paper

Related Project:

CP 1, CP 3,

Abstract:

Just-in-time adaptive interventions (JITAIs) are time-varying adaptive interventions that use frequent opportunities for the intervention to be adapted-weekly, daily, or even many times a day. The microrandomized trial (MRT) has emerged for use in informing the construction of JITAIs. MRTs can be used to address research questions about whether and under what circumstances JITAI components are effective, with the ultimate objective of developing effective and efficient JITAI.

The purpose of this article is to clarify why, when, and how to use MRTs; to highlight elements that must be considered when designing and implementing an MRT; and to review primary and secondary analyses methods for MRTs. We briefly review key elements of JITAIs and discuss a variety of considerations that go into planning and designing an MRT. We provide a definition of causal excursion effects suitable for use in primary and secondary analyses of MRT data to inform JITAI development. We review the weighted and centered least-squares (WCLS) estimator which provides consistent causal excursion effect estimators from MRT data. We describe how the WCLS estimator along with associated test statistics can be obtained using standard statistical software such as R (R Core Team, 2019). Throughout we illustrate the MRT design and analyses using the HeartSteps MRT, for developing a JITAI to increase physical activity among sedentary individuals. We supplement the HeartSteps MRT with two other MRTs, SARA and BariFit, each of which highlights different research questions that can be addressed using the MRT and experimental design considerations that might arise.

TL;DR:

Throughout we illustrate the MRT design and analyses using the HeartSteps MRT, for developing a JITAI to increase physical activity among sedentary individuals.

Dyadic Reinforcement Learning

Authors:

Karine Tung, Pedja Klasnja, Susan Murphy, Benjamin M. Marlin

Publication Venue:

Conference on Uncertainty in Artificial Intelligence (UAI 2023)

Publication Date:

May 17, 2023

Keywords:

reinforcement learning, partial observability, context inference, adaptive interventions, empirical evaluation, mobile health

View Full Paper

Related Project:

CP 3

Abstract:

TL;DR:

Linear Mixed Models with Endogenous Covariates: Modeling Sequential Treatment Effects with Application to a Mobile Health Study

Authors:

Tianchen Qian, Predrag Klasnja, Susan Murphy

Publication Venue:

Statistical Science: a review journal of the Institute of Mathematical Statistics

Keywords:

causal inference, endogenous covariates, linear mixed model, micro-randomized trial

Publication Date:

October 2020

View Full Paper

Related Project:

CP 3

Abstract:

Mobile health is a rapidly developing field in which behavioral treatments are delivered to individuals via wearables or smartphones to facilitate health-related behavior change. Micro-randomized trials (MRT) are an experimental design for developing mobile health interventions. In an MRT the treatments are randomized numerous times for each individual over course of the trial. Along with assessing treatment effects, behavioral scientists aim to understand between-person heterogeneity in the treatment effect. A natural approach is the familiar linear mixed model. However, directly applying linear mixed models is problematic because potential moderators of the treatment effect are frequently endogenous-that is, may depend on prior treatment. We discuss model interpretation and biases that arise in the absence of additional assumptions when endogenous covariates are included in a linear mixed model. In particular, when there are endogenous covariates, the coefficients no longer have the customary marginal interpretation. However, these coefficients still have a conditional-on-the-random-effect interpretation. We provide an additional assumption that, if true, allows scientists to use standard software to fit linear mixed model with endogenous covariates, and person-specific predictions of effects can be provided. As an illustration, we assess the effect of activity suggestion in the HeartSteps MRT and analyze the between-person treatment effect heterogeneity.

TL;DR:

We discuss model interpretation and biases that arise in the absence of additional assumptions when endogenous covariates are included in a linear mixed model. In particular, when there are endogenous covariates, the coefficients no longer have the customary marginal interpretation.

Off-Policy Estimation of Long-Term Average Outcomes with Applications to Mobile Health

Authors:

Peng Liao, Predrag Klasnja, Susan Murphy

Publication Venue:

Journal of the American Statistical Association

Keywords:

sequential decision making, policy evaluation, markov decision process, reinforcement learning

Publication Date:

2021

View Full Paper

Related Project:

CP 3

Abstract:

Due to the recent advancements in wearables and sensing technology, health scientists are increasingly developing mobile health (mHealth) interventions. In mHealth interventions, mobile devices are used to deliver treatment to individuals as they go about their daily lives. These treatments are generally designed to impact a near time, proximal outcome such as stress or physical activity. The mHealth intervention policies, often called just-in-time adaptive interventions, are decision rules that map a individual’s current state (e.g., individual’s past behaviors as well as current observations of time, location, social activity, stress and urges to smoke) to a particular treatment at each of many time points. The vast majority of current mHealth interventions deploy expert-derived policies. In this paper, we provide an approach for conducting inference about the performance of one or more such policies using historical data collected under a possibly different policy. Our measure of performance is the average of proximal outcomes over a long time period should the particular mHealth policy be followed. We provide an estimator as well as confidence intervals. This work is motivated by HeartSteps, an mHealth physical activity intervention.

TL;DR:

In this paper, we provide an approach for conducting inference about the performance of one or more such policies using historical data collected under a possibly different policy.

The mDOT Center

CP 13: Enabling Robust Adaptation in mHealth Interventions