CP 3: Operationalizing Behavioral Theory for mHealth: Dynamics, Context, and Personalization

Collaborating Investigator:

Dr. Predrag Klasnja, University of Michigan

Funding Status:

1U01CA229445-01

NIH/NCI

9/19/18 – 8/31/22

Associated with:

TR&D1, TR&D2

Publications
Significance
Approach & Push Pull Relationship

TR&D1: Discovery

Heteroscedastic Temporal Variational Autoencoder for Irregularly Sampled Time Series

Authors:

Satya Narayan Shukla, Benjamin Marlin

Publication Venue:

International Conference on Learning Representations (ICLR)

Publication Date:

January 28, 2022

Keywords:

irregular sampling, uncertainty, imputation, interpolation, multivariate time series, missing data, variational autoencoder

View Full Paper

Related Project:

CP 3

In order to model and represent uncertainty in mHealth biomarkers to account for multifaceted uncertainty during momentary decision making in selecting, adapting, and delivering temporally-precise mHealth interventions. In this period, we extended our previous deep learning approach, Multi-Time Attention Networks, to enable improved representation of output uncertainty. Our new approach preserves the idea of learned temporal similarity functions and adds heteroskedastic output uncertainty. The new framework is referred to as the Heteroskedastic Variational Autoencoder and models real-valued multivariate data.

Abstract:

Irregularly sampled time series commonly occur in several domains where they present a significant challenge to standard deep learning models. In this paper, we propose a new deep learning framework for probabilistic interpolation of irregularly sampled time series that we call the Heteroscedastic Temporal Variational Autoencoder (HeTVAE). HeTVAE includes a novel input layer to encode information about input observation sparsity, a temporal VAE architecture to propagate uncertainty due to input sparsity, and a heteroscedastic output layer to enable variable uncertainty in output interpolations. Our results show that the proposed architecture is better able to reflect variable uncertainty through time due to sparse and irregular sampling than a range of baseline and traditional models, as well as recently proposed deep latent variable models that use homoscedastic output layers.

TL;DR:

We present a new deep learning architecture for probabilistic interpolation of irregularly sampled time series.

BayesLDM: A Domain-Specific Language for Probabilistic Modeling of Longitudinal Data

Authors:

Karine Tung, Steven De La Torre, Mohamed El Mistiri, Rebecca Braga De Braganca, Eric Hekler, Misha Pavel, Daniel Rivera, Pedja Klasnja, Donna Spruijt-Metz, Benjamin M. Marlin

Publication Venue:

IEEE/ACM international conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE)

Publication Date:

September 12, 2022

Keywords:

Bayesian inference, probabilistic programming, time series, missing data, Bayesian imputation, mobile health

View Full Paper

Related Projects:

CP 3, CP 4

We have developed a toolbox for the specification and estimation of mechanistic models in the dynamic bayesian network family. This toolbox focuses on making it easier to specify probabilistic dynamical models for time series data and to perform Bayesian inference and imputation in the specified model given incomplete data as input. The toolbox is referred to as BayesLDM. We have been working with members of CP3, CP4, and TR&D2 to develop offline data analysis and simulation models using this toolbox. We are also currently in discussions with members of CP4 to deploy the toolbox’s Bayesian imputation methods within a live controller optimization trial in the context of an adaptive walking intervention.

Abstract:

In this paper we present BayesLDM, a system for Bayesian longitudinal data modeling consisting of a high-level modeling language with specific features for modeling complex multivariate time series data coupled with a compiler that can produce optimized probabilistic program code for performing inference in the specified model. BayesLDM supports modeling of Bayesian network models with a specific focus on the efficient, declarative specification of dynamic Bayesian Networks (DBNs). The BayesLDM compiler combines a model specification with inspection of available data and outputs code for performing Bayesian inference for unknown model parameters while simultaneously handling missing data. These capabilities have the potential to significantly accelerate iterative modeling workflows in domains that involve the analysis of complex longitudinal data by abstracting away the process of producing computationally efficient probabilistic inference code. We describe the BayesLDM system components, evaluate the efficiency of representation and inference optimizations and provide an illustrative example of the application of the system to analyzing heterogeneous and partially observed mobile health data.

TL;DR:

We present a a toolbox for the specification and estimation of mechanistic models in the dynamic bayesian network family.

Assessing the Impact of Context Inference Error & Partial Observability on RL Methods for Just-In-Time Adaptive Interventions

Authors:

Karine Tung, Pedja Klasnja, Susan Murphy, Benjamin M. Marlin

Publication Venue:

Conference on Uncertainty in Artificial Intelligence (UAI 2023)

Publication Date:

May 17, 2023

Keywords:

reinforcement learning, partial observability, context inference, adaptive interventions, empirical evaluation, mobile health

View Full Paper

Related Project:

CP 3

Abstract:

Just-in-Time Adaptive Interventions (JITAIs) are a class of personalized health interventions developed within the behavioral science community. JITAIs aim to provide the right type and amount of support by iteratively selecting a sequence of intervention options from a pre-defined set of components in response to each individual’s time varying state. In this work, we explore the application of reinforcement learning methods to the problem of learning intervention option selection policies. We study the effect of context inference error and partial observability on the ability to learn effective policies. Our results show that the propagation of uncertainty from context inferences is critical to improving intervention efficacy as context uncertainty increases, while policy gradient algorithms can provide remarkable robustness to partially observed behavioral state information.

TL;DR:

This work focuses on JITAIs, personalized health interventions that dynamically select support components based on an individual’s changing state. The study applies reinforcement learning methods to learn policies for selecting intervention options, revealing that uncertainty from context inferences is crucial for enhancing intervention efficacy as context uncertainty increases.

TR&D2: Optimization

Effect-Invariant Mechanisms for Policy Generalization

Authors:

Sorawit Saengkyongam, Niklas Pfister, Predrag Klasnja, Susan Murphy, Jonas Peters

Publication Venue:

arXiv:2306.10983

Publication Date:

June 27, 2023

Keywords:

effect-invariant mechanisms, policy generalization, machine learning

View Full Paper

Related Projects:

Cp 3

Abstract:

Policy learning is an important component of many real-world learning systems. A major challenge in policy learning is how to adapt efficiently to unseen environments or tasks. Recently, it has been suggested to exploit invariant conditional distributions to learn models that generalize better to unseen environments. However, assuming invariance of entire conditional distributions (which we call full invariance) may be too strong of an assumption in practice. In this paper, we introduce a relaxation of full invariance called effect-invariance (e-invariance for short) and prove that it is sufficient, under suitable assumptions, for zero-shot policy generalization. We also discuss an extension that exploits e-invariance when we have a small sample from the test environment, enabling few-shot policy generalization. Our work does not assume an underlying causal graph or that the data are generated by a structural causal model; instead, we develop testing procedures to test e-invariance directly from data. We present empirical results using simulated data and a mobile health intervention dataset to demonstrate the effectiveness of our approach.

TL;DR:

In this paper, we introduce a relaxation of full invariance called effect-invariance (e-invariance for short) and prove that it is sufficient, under suitable assumptions, for zero-shot policy generalization. We also discuss an extension that exploits e-invariance when we have a small sample from the test environment, enabling few-shot policy generalization.

Batch Policy Learning in Average Reward Markov Decision Processes

Authors:

Peng Liao, Zhengling Qi, Runzhe Wan, Predrag Klasnja, Susan Murphy

Publication Venue:

The Annals of Statistics

Publication Date:

December 21, 2022

Keywords:

average reward, doubly robust estimator, Markov Decision Process, policy optimization

View Full Paper

Related Project:

CP 3

Abstract:

We consider the batch (off-line) policy learning problem in the infinite horizon Markov decision process. Motivated by mobile health applications, we focus on learning a policy that maximizes the long-term average reward. We propose a doubly robust estimator for the average reward and show that it achieves semiparametric efficiency. Further, we develop an optimization algorithm to compute the optimal policy in a parameterized stochastic policy class. The performance of the estimated policy is measured by the difference between the optimal average reward in the policy class and the average reward of the estimated policy and we establish a finite-sample regret guarantee. The performance of the method is illustrated by simulation studies and an analysis of a mobile health study promoting physical activity.

TL;DR:

We consider batch policy learning in an infinite horizon Markov Decision Process, focusing on optimizing a policy for long-term average reward in the context of mobile health applications.

Data-driven Interpretable Policy Construction for Personalized Mobile Health

Authors:

Dimitris Bertsimas, Predrag Klasnja, Susan Murphy, Liangyuan Na, Susan A. Murphy

Publication Venue:

IEEE International Conference on Digital Health (ICDH)

Publication Date:

July 10, 2022

Keywords:

learning systems, optimized production technology, behavioral sciences, electronic healthcare, decision trees

Access Full Paper

Related Project:

CP 3

Abstract:

To promote healthy behaviors, many mobile health applications provide message-based interventions, such as tips, motivational messages, or suggestions for healthy activities. Ideally, the intervention policies should be carefully designed so that users obtain the benefits without being overwhelmed by overly frequent messages. As part of the HeartSteps physical-activity intervention, users receive messages intended to disrupt sedentary behavior. HeartSteps uses an algorithm to uniformly spread out the daily message budget over time, but does not attempt to maximize treatment effects. This limitation motivates constructing a policy to optimize the message delivery decisions for more effective treatments. Moreover, the learned policy needs to be interpretable to enable behavioral scientists to examine it and to inform future theorizing. We address this problem by learning an effective and interpretable policy that reduces sedentary behavior. We propose Optimal Policy Trees + (OPT+), an innovative batch off-policy learning method, that combines a personalized threshold learning and an extension of Optimal Policy Trees under a budget-constrained setting. We implement and test the method using data collected in HeartSteps V2N3. Computational results demonstrate a significant reduction in sedentary behavior with a lower delivery budget. OPT + produces a highly interpretable and stable output decision tree thus enabling theoretical insights to guide future research.

TL;DR:

Online RL faces challenges like real-time stability and handling complex, unpredictable environments; to address these issues, the PCS framework originally used in supervised learning is extended to guide the design of RL algorithms for such settings, including guidelines for creating simulation environments, as exemplified in the development of an RL algorithm for the mobile health study Oralytics aimed at enhancing tooth-brushing behaviors through personalized intervention messages.

Did We Personalize? Assessing Personalization by an Online Reinforcement Learning Algorithm Using Resampling

Authors:

Susobhan Ghosh, Raphael Kim, Prasidh Chhabria, Raaz Dwivedi, Predrag Klasnja, Peng Liao, Kelly Zhang, Susan Murphy

Publication Venue:

arXiv:2304.05365v6

Publication Date:

August 7, 2023

Keywords:

reinforcement learning, personalization, resampling, exploratory data analysis, mobile health

View Full Paper

Related Project:

CP 3

Abstract:

There is a growing interest in using reinforcement learning (RL) to personalize sequences of treatments in digital health to support users in adopting healthier behaviors. Such sequential decision-making problems involve decisions about when to treat and how to treat based on the user’s context (e.g., prior activity level, location, etc.). Online RL is a promising data-driven approach for this problem as it learns based on each user’s historical responses and uses that knowledge to personalize these decisions. However, to decide whether the RL algorithm should be included in an “optimized” intervention for real-world deployment, we must assess the data evidence indicating that the RL algorithm is actually personalizing the treatments to its users. Due to the stochasticity in the RL algorithm, one may get a false impression that it is learning in certain states and using this learning to provide specific treatments. We use a working definition of personalization and introduce a resampling-based methodology for investigating whether the personalization exhibited by the RL algorithm is an artifact of the RL algorithm stochasticity. We illustrate our methodology with a case study by analyzing the data from a physical activity clinical trial called HeartSteps, which included the use of an online RL algorithm. We demonstrate how our approach enhances data-driven truth-in-advertising of algorithm personalization both across all users as well as within specific users in the study.

TL;DR:

We use a working definition of personalization and introduce a resampling-based methodology for investigating whether the personalization exhibited by the RL algorithm is an artifact of the RL algorithm stochasticity.

The Microrandomized Trial for Developing Digital Interventions: Experimental Design and Data Analysis Considerations

Authors:

Tianchen Qian, Ashley Walton, Linda Collins, Predrag Klasnja, Stephanie Lanza, Inbal Nahum-Shani, Mashfiqui Rabbi, Michael Russell, Maureen Walton, Hyesun Yoo, Susan Murphy

Publication Venue:

Psychological Methods

Publication Date:

January 13, 2022

Keywords:

Micro-randomized trial (MRT), health behavior change, digital intervention, just-in-time adaptive intervention (JITAI), causal inference, intensive longitudinal data

View Full Paper

Related Project:

CP 1, CP 3,

Abstract:

Just-in-time adaptive interventions (JITAIs) are time-varying adaptive interventions that use frequent opportunities for the intervention to be adapted-weekly, daily, or even many times a day. The microrandomized trial (MRT) has emerged for use in informing the construction of JITAIs. MRTs can be used to address research questions about whether and under what circumstances JITAI components are effective, with the ultimate objective of developing effective and efficient JITAI.

The purpose of this article is to clarify why, when, and how to use MRTs; to highlight elements that must be considered when designing and implementing an MRT; and to review primary and secondary analyses methods for MRTs. We briefly review key elements of JITAIs and discuss a variety of considerations that go into planning and designing an MRT. We provide a definition of causal excursion effects suitable for use in primary and secondary analyses of MRT data to inform JITAI development. We review the weighted and centered least-squares (WCLS) estimator which provides consistent causal excursion effect estimators from MRT data. We describe how the WCLS estimator along with associated test statistics can be obtained using standard statistical software such as R (R Core Team, 2019). Throughout we illustrate the MRT design and analyses using the HeartSteps MRT, for developing a JITAI to increase physical activity among sedentary individuals. We supplement the HeartSteps MRT with two other MRTs, SARA and BariFit, each of which highlights different research questions that can be addressed using the MRT and experimental design considerations that might arise.

TL;DR:

Throughout we illustrate the MRT design and analyses using the HeartSteps MRT, for developing a JITAI to increase physical activity among sedentary individuals.

Dyadic Reinforcement Learning

Authors:

Karine Tung, Pedja Klasnja, Susan Murphy, Benjamin M. Marlin

Publication Venue:

Conference on Uncertainty in Artificial Intelligence (UAI 2023)

Publication Date:

May 17, 2023

Keywords:

reinforcement learning, partial observability, context inference, adaptive interventions, empirical evaluation, mobile health

View Full Paper

Related Project:

CP 3

Abstract:

TL;DR:

Linear Mixed Models with Endogenous Covariates: Modeling Sequential Treatment Effects with Application to a Mobile Health Study

Authors:

Tianchen Qian, Predrag Klasnja, Susan Murphy

Publication Venue:

Statistical Science: a review journal of the Institute of Mathematical Statistics

Keywords:

causal inference, endogenous covariates, linear mixed model, micro-randomized trial

Publication Date:

October 2020

View Full Paper

Related Project:

CP 3

Abstract:

Mobile health is a rapidly developing field in which behavioral treatments are delivered to individuals via wearables or smartphones to facilitate health-related behavior change. Micro-randomized trials (MRT) are an experimental design for developing mobile health interventions. In an MRT the treatments are randomized numerous times for each individual over course of the trial. Along with assessing treatment effects, behavioral scientists aim to understand between-person heterogeneity in the treatment effect. A natural approach is the familiar linear mixed model. However, directly applying linear mixed models is problematic because potential moderators of the treatment effect are frequently endogenous-that is, may depend on prior treatment. We discuss model interpretation and biases that arise in the absence of additional assumptions when endogenous covariates are included in a linear mixed model. In particular, when there are endogenous covariates, the coefficients no longer have the customary marginal interpretation. However, these coefficients still have a conditional-on-the-random-effect interpretation. We provide an additional assumption that, if true, allows scientists to use standard software to fit linear mixed model with endogenous covariates, and person-specific predictions of effects can be provided. As an illustration, we assess the effect of activity suggestion in the HeartSteps MRT and analyze the between-person treatment effect heterogeneity.

TL;DR:

We discuss model interpretation and biases that arise in the absence of additional assumptions when endogenous covariates are included in a linear mixed model. In particular, when there are endogenous covariates, the coefficients no longer have the customary marginal interpretation.

Off-Policy Estimation of Long-Term Average Outcomes with Applications to Mobile Health

Authors:

Peng Liao, Predrag Klasnja, Susan Murphy

Publication Venue:

Journal of the American Statistical Association

Keywords:

sequential decision making, policy evaluation, markov decision process, reinforcement learning

Publication Date:

2021

View Full Paper

Related Project:

CP 3

Abstract:

Due to the recent advancements in wearables and sensing technology, health scientists are increasingly developing mobile health (mHealth) interventions. In mHealth interventions, mobile devices are used to deliver treatment to individuals as they go about their daily lives. These treatments are generally designed to impact a near time, proximal outcome such as stress or physical activity. The mHealth intervention policies, often called just-in-time adaptive interventions, are decision rules that map a individual’s current state (e.g., individual’s past behaviors as well as current observations of time, location, social activity, stress and urges to smoke) to a particular treatment at each of many time points. The vast majority of current mHealth interventions deploy expert-derived policies. In this paper, we provide an approach for conducting inference about the performance of one or more such policies using historical data collected under a possibly different policy. Our measure of performance is the average of proximal outcomes over a long time period should the particular mHealth policy be followed. We provide an estimator as well as confidence intervals. This work is motivated by HeartSteps, an mHealth physical activity intervention.

TL;DR:

In this paper, we provide an approach for conducting inference about the performance of one or more such policies using historical data collected under a possibly different policy.

The long-term goal of CP3 is to enable the transformative potential of mHealth by addressing the behavior-theoretic, measurement, modeling, and intervention design challenges and opportunities presented by intensively collected longitudinal data. CP3 will investigate these issues by focusing on physical activity and sedentary behavior. To validate the proposed research, CP3 builds on the NIH-funded HeartSteps trial, which CP3 collaborator Klasnja leads. HeartSteps is a year-long micro-randomized trial (MRT) of an adaptive mHealth intervention based on Social-Cognitive Theory (SCT) that aims to increase walking and decrease sedentary behavior in a cohort of 60 patients with Stage 1 hypertension.

CP3 aims to develop and refine measures of theoretical constructs that influence behaviors and intervention response. Based on methods advanced in NIH’s Science of Behavior Change, CP3 will refine measures of dynamic theoretical constructs hypothesized by SCT to shape our target behaviors, as well as develop measures of constructs postulated by the Dual Process theories. Measures will be developed or refined to enable modeling of intensive longitudinal data about psychosocial and contextual influences on walking and sedentary behavior at different time scales, from hourly to monthly. Further, CP3 will enhance the existing HeartSteps trial with additional measures and recruit a second cohort of sedentary overweight/obese, but otherwise healthy adults. HeartSteps employs novel sources of information (e.g. wearable sensors, users’ calendars, location and other smartphone data) to obtain measures that were previously dependent on self-report. In this study, CP3 will enrich HeartSteps with the developed measures and add a second cohort of 60 sedentary overweight/obese, but otherwise healthy adults. The two HeartSteps cohorts will provide data needed to validate the proposed measures as well as to support model development and validation. Specifically, CP3 includes research on operationalizing dynamic and contextualized theories of behavior in naturalistic and interventional settings within the dynamic Bayesian network model framework, including learning personalized models and warm-starting personalization from population-level models.

Both TR&D1 and TR&D2 will work with CP3 to ensure that the methods developed are grounded in real-life needs and to ensure that the technologies developed are readily usable. The HeartSteps cohort data contains rich multimodal mHealth biomarker time series with complex patterns of noise and missingness (different from the case of oral health biomarkers of CP2). CP3 will benefit from uncertainty models from this iterative collaboration, while the size and complexity of the data will provide an opportunity for thorough empirical evaluation and validation of the TR&D1 Aim 1 approach. Another important issue for CP3 is the potentially high risk for participant disengagement given the one year duration of the HeartSteps study. CP3 will work with TR&D1 Aim 3 to develop novel composite scores of disengagement risk and receptivity to engagement interventions. This will provide an opportunity to TR&D1 to extend the methods of Aim 3 to a novel setting which differs significantly from the risk scores related to smoking lapse, dental disease, and other use cases. Temporal triggers and risk factors for disengagement and receptivity will be identified in an iterative process and compiled into the composite score. Disengagement outcomes from HeartSteps cohort will be used to refine and validate the resulting scores.

CP3 will collaborate with TR&D2 on all three specific aims. In particular, CP3 needs to account for delayed effects due to user habituation (Aim 1). CP3 will contribute data and collaborate on constructing the warm-start population-level baseline models for the personalization of decision rules, under Aim 2. This will push the boundaries of personalization methods beyond the via traditional (high variance, low bias) person-specific or (low variance, high bias) population-based algorithms. A fundamental challenge CP3 is confronting is that it utilizes interventions operating at different time scales and with different proximal outcomes. Currently, CP3 assumes that the decision rules for all of these interventions can be learned independently. However, CP3 recognizes that burden imposed by one type of intervention is likely to spill over and reduce effectiveness of interventions at other time scales. Thus, the work under Aim 3 by TR&D2 is critical to CP3. CP3 is committed to including both the methods for accommodating the delayed effects under Aim 1 as well as the personalization algorithm in their updated version of the HeartSteps application and conducting a feasibility study for use in informing future research directions of both CP3 and TR&D 2. CP3 will provide a real-life evaluation of the methods developed under all three specific aims of TR&D2, and contribute to iterative refinement, as participants experience these algorithms over the duration of one year in the study.

The mDOT Center

CP 3: Operationalizing Behavioral Theory for mHealth: Dynamics, Context, and Personalization