Collaborating Investigator:
Dr. Inbal Nahum-Shani, University of Michigan
Funding Status:
NIH/NIDA
9/1/21 – 6/30/26
Associated with:
Algorithms in Decision Support Systems
July 22, 2022
reinforcement learning, online learning, mobile health, algorithm design, algorithm evaluation
Online RL faces challenges like real-time stability and handling complex, unpredictable environments; to address these issues, the PCS framework originally used in supervised learning is extended to guide the design of RL algorithms for such settings, including guidelines for creating simulation environments, as exemplified in the development of an RL algorithm for the mobile health study Oralytics aimed at enhancing tooth-brushing behaviors through personalized intervention messages.
Conference on Innovative Applications of Artificial Intelligence (IAAI 2023)
February 7, 2023
reinforcement learning, online learning, mobile health, algorithm design, algorithm evaluation
Dental disease is one of the most common chronic diseases despite being largely preventable. However, professional advice on optimal oral hygiene practices is often forgotten or abandoned by patients. Therefore patients may benefit from timely and personalized encouragement to engage in oral self-care behaviors. In this paper, we develop an online reinforcement learning (RL) algorithm for use in optimizing the delivery of mobile-based prompts to encourage oral hygiene behaviors. One of the main challenges in developing such an algorithm is ensuring that the algorithm considers the impact of the current action on the effectiveness of future actions (i.e., delayed effects), especially when the algorithm has been made simple in order to run stably and autonomously in a constrained, real-world setting (i.e., highly noisy, sparse data). We address this challenge by designing a quality reward which maximizes the desired health outcome (i.e., high-quality brushing) while minimizing user burden. We also highlight a procedure for optimizing the hyperparameters of the reward by building a simulation environment test bed and evaluating candidates using the test bed. The RL algorithm discussed in this paper will be deployed in Oralytics, an oral self-care app that provides behavioral strategies to boost patient engagement in oral hygiene practices.
In this paper, we develop an online reinforcement learning (RL) algorithm for use in optimizing the delivery of mobile-based prompts to encourage oral hygiene behaviors.
April 19, 2023
Online reinforcement learning and other adaptive sampling algorithms are increasingly used in digital intervention experiments to optimize treatment delivery for users over time. In this work, we focus on longitudinal user data collected by a large class of adaptive sampling algorithms that are designed to optimize treatment decisions online using accruing data from multiple users. Combining or “pooling” data across users allows adaptive sampling algorithms to potentially learn faster. However, by pooling, these algorithms induce dependence between the sampled user data trajectories; we show that this can cause standard variance estimators for i.i.d. data to underestimate the true variance of common estimators on this data type. We develop novel methods to perform a variety of statistical analyses on such adaptively sampled data via Z-estimation. Specifically, we introduce the adaptive sandwich variance estimator, a corrected sandwich estimator that leads to consistent variance estimates under adaptive sampling. Additionally, to prove our results we develop novel theoretical tools for empirical processes on non-i.i.d., adaptively sampled longitudinal data which may be of independent interest. This work is motivated by our efforts in designing experiments in which online reinforcement learning algorithms optimize treatment decisions, yet statistical inference is essential for conducting analyses after experiments conclude.
To appear in Volume 20 of the Annual Review of Clinical Psychology, 2023
2023
engagement, oral health, mobile health intervention, racial and ethnic minority group, message development
arXiv:2308.07843
November 3, 2023
dyadic reinforcement learning, online learning, mobile health, algorithm design
Mobile health aims to enhance health outcomes by delivering interventions to individuals as they go about their daily life. The involvement of care partners and social support networks often proves crucial in helping individuals managing burdensome medical conditions. This presents opportunities in mobile health to design interventions that target the dyadic relationship — the relationship between a target person and their care partner — with the aim of enhancing social support. In this paper, we develop dyadic RL, an online reinforcement learning algorithm designed to personalize intervention delivery based on contextual factors and past responses of a target person and their care partner. Here, multiple sets of interventions impact the dyad across multiple time intervals. The developed dyadic RL is Bayesian and hierarchical. We formally introduce the problem setup, develop dyadic RL and establish a regret bound. We demonstrate dyadic RL’s empirical performance through simulation studies on both toy scenarios and on a realistic test bed constructed from data collected in a mobile health study.
In this paper, we develop dyadic RL, an online reinforcement learning algorithm designed to personalize intervention delivery based on contextual factors and past responses of a target person and their care partner.
The focus of the MAPS Center is the development, evaluation, and dissemination of novel research methodologies that are essential to optimize adaptive interventions to combat SUD/HIV. Project 2 focuses on developing innovative methods that will enable scientists, for the first time, to optimize the integration of human-delivered services with relatively low-intensity adaptation (i.e., adaptive interventions) and digital services with high-intensity adaptation (i.e., JITAIs). This project will develop a new trial design in which individuals can be randomized simultaneously to human-delivered and digital interventions at different time scales. This includes developing guidelines for trial design, sample size calculators, and statistical analysis methods that will enable scientists to use data from the new experimental design to address novel questions about synergies between human-delivered adaptive interventions and digital JITAIs. Project 3 focuses on developing innovative methods to optimize JITAIs in which the decision rules are continually updated to ensure effective adaptation as individual needs change and societal trends occur. Integrating approaches from artificial intelligence and statistics, this project will develop algorithms that continually update “population-based” decision rules (designed to work well for all individuals on average) to improve intervention effectiveness. This project will also generalize these algorithms to continually optimize “person-specific” decision rules for JITAIs. The algorithms will be designed specifically to (a) assign each individual the intervention that is right for them at a particular moment; (b) maintain acceptable levels of burden; and (c) maintain engagement.
Project 3 of MAPS aims to collaborate with TR&D2 (Murphy) by developing methods for appropriately pooling of data from multiple users to speed up learning of both population-based decision rules as well as personalized decision rules. These collaborations will used to enhance the impact of TR&D2’s Aims 2 and 3 and thus lay the foundation for successful future research projects. Project 3 of MAPS aims to collaborate with TR&D1 (Marlin) by utilizing advances by TR&D1 in propagating and representing uncertainty in Project 3’s development of methods for adapting the timing and location of delivery of different intervention prompts. These collaborations will increase the impact of TR&D1’s Aims 1 and 2. Project 2 of MAPS plans to collaborate with TR&D1 (Marlin) to develop a composite substance use risk indicator derived from sensor data that can be assessed at different time scales and hence can inform the adaptation of both human-delivered and digital interventions; and to collaborate with TR&D2 (Murphy) to develop optimization methods for learning what type and under what conditions digital interventions are best delivered in a setting in which non-digital interventions (human-delivered interventions) are also provided– this is an extreme case of TR&D2’s Aim 3 focused on multiple intervention components delivered at different time scales and with different short-term objectives. As such this collaboration has the potential to synergistically enhance both TR&D’s as well as MAP’s Project 2 aims.
You must be logged in to post a comment.
No Comments