Technology Research & Development Project 2 (TR&D2): Optimization

Dynamic Optimization of Continuously Adapting mHealth Interventions via Prudent, Statistically Efficient, and Coherent Reinforcement Learning

TR&D2 Lead: Susan Murphy

Co-Investigators: Santosh Kumar, Benjamin Marlin

Mobile health (mHealth) interventions have typically used hand-crafted decision rules that map from biomarkers of an individual’s state to the selection of interventions. Recently, reinforcement learning (RL) has emerged as a promising approach for online optimization of decision rules. Continuous, passive detection of the individual’s state using mHealth biomarkers enables dynamic deployment of decision rules at the right moment, i.e., as and when events of interest are detected from sensors. RL-based optimization methods that leverage this new capability created by sensor-based biomarkers, can enable the development and optimization of temporally-precise mHealth interventions, overcoming the significant limitations of static, one-size-fits-all decision rules. Such next generation interventions have the potential to lead to greater treatment efficacy and improved long-term engagement.

However, there exist several critical challenges to the realization of effective, real-world RL-based interventions including the need to learn efficiently based on limited interactions with an individual while accounting for longer-term effects of intervention decisions, (i.e., to avoid habituation and ensure continued engagement), and accommodating multiple intervention components operating at different time scales and targeting different outcomes. As a result, the use of RL in mHealth interventions has mostly been limited to very few studies using basic RL methods.

To address these critical challenges, TR&D2 builds on more precise biomarkers of context, including TR&D1 risk and engagement scores, to develop, evaluate, and disseminate robust and data efficient RL methods and tools. These methods continually personalize the selection, adaptation and delivery timing decision rules for core intervention components so as to maximize long-term therapeutic efficacy and engagement for every individual.

Research Goals

TR&D2 is conducting the following innovative research to address the technological challenges described above:

  • Account for delayed treatment effects via prudent learning of decision rules. Generalize current myopic Bandit RL methods to enable learning non-myopic decision rules that account for delayed intervention effects. A particular focus is delayed effects due to intervention burden.
  • Enable efficient personalization via optimizing data sharing across users. Develop RL methods that personalize decision rules for every individual by optimally leveraging data across a population or cohort to accelerate learning.
  • Enable coherent learning of decision rules across intervention components operating at different time scales and with different objectives. Increasingly, mobile interventions include multiple components targeting different outcomes (e.g., stress, inactivity) and time scales (e.g., within day, daily). TR&D2 will develop approaches to use distal health outcomes to guide learning for these multiple components so as to minimize negative interactions.

Technological Resources

TR&D2 is producing the following technological resources for the community:

  • A toolkit for online intervention optimization. This toolbox will include cloud-based modules for personalizing adaptation rules as well as smartphone modules implementing real-time intervention selection. This system component will be implemented as a cloud-based micro-service that can be flexibly integrated into other existing mobile or cloud-based platforms developed by our collaborators as well as members of the broader research community.
  • A reference tutorial for use of the online intervention optimization toolkit. Both tools will be implemented within the mDOT Center software framework and will enable a broad segment of the mHealth research community to continuously optimize mHealth intervention rules, so as to achieve optimal efficacy and engagement for individuals despite dynamic variations in the physical, behavioral, social, & environmental states.

Impact on Science & Society

Research and development by TR&D2 will significantly advance RL methodology for personalizing decision rules; in particular, with regards to online algorithms that personalize interventions for each user by appropriately pooling across multiple users. Through work with collaborative projects (CPs) and service projects (SPs), TR&D2 will advance the understanding of the impact of personalization via test cases in obesity (SP4), engagement (CP1), cigarette smoking (CP1, SP5), oral health (CP2) and physical activity (CP3); this variety contains interventional components (goal setting/planning and just-in-time support) common to many mHealth interventions. TR&D2’s approaches will be used by our CPs to test whether particular aspects of state arising in dynamic behavioral theories should be intervened on in order to break links between state and adverse health outcomes. TR&D2 will inform intervention research by disseminating software systems and tools that will help to extend behavioral interventions outside of the clinical setting. This is particularly relevant for patients who have limited access to clinics. The personalization methods developed here will lay the groundwork for future projects building longer-term support, both by accommodating user non-stationarity and in maintaining longer-term user engagement. It will help establish the area of Multi-agent RL for mHealth.

Recent Publications

Coming soon

TR&D2 Team

Susan Murphy, Ph.D.

TR&D2 Lead

Susan Murphy


Dr. Susan Murphy is the Professor of Statistics and Computer Science, and the Radcliffe Alumnae Professor at the Radcliffe Institute, Harvard University. She directs the Statistical Reinforcement Learning Lab at Harvard University. Her research concerns clinical trial design and the development of data analytic methods for informing multi-stage decision making in health. In particular for (1) constructing individualized sequences of treatments (a.k.a., adaptive interventions) for use in informing clinical decision making and (2) constructing real-time individualized sequences of treatments (a.k.a., Just-in-Time Adaptive Interventions) delivered by mobile devices. Murphy has developed a formal model of this decision-making process and an innovative design for clinical trials called Sequential Multiple Assignment Randomized Trial (SMART) that allow researchers to optimize adaptive interventions. In 2016, she was elected a member of the National Academy of Sciences, in 2014, she was elected a member of the National Academy of Medicine, and in 2013, she was selected as a MacArthur Fellow. Visit Google Scholar page

Santosh Kumar, Ph.D.

Lead PI, Center Director, TR&D1, TR&D2, TR&D3

Santosh Kumar

University of Memphis

Dr. Santosh Kumar is the Lillian and Morrie Moss Chair of Excellence Professor in the Department of Computer Science at the University of Memphis and the Director of the NIH Center of Excellence for Mobile Sensor Data-to-Knowledge (MD2K), which is headquartered at the University of Memphis. He received his Ph.D. in Computer Science and Engineering from The Ohio State University in 2006, where his dissertation won a presidential fellowship. In 2010, Popular Science magazine named him one of America’s ten most brilliant scientists under the age of 38 (called “Brilliant Ten”). In 2011, he chaired the “mHealth Evidence” meeting jointly organized by NIH, NSF, RWJF, and McKesson Foundation to establish evidence requirements for mHealth. In 2013, he was invited to meet with the NIH Director to advise him on NIH efforts in the area of mHealth and was invited to the White House to give a talk on the future of Biosensors. In 2014, he co-organized and co-chaired the NSF-NIH Workshop on Computing Challenges in Future Mobile Health (mHealth) Systems and Applications. He holds the distinction of receiving the largest grants from both NIH ($10.8 million in 2014) and NSF ($4 million In 2016) in the history of the University of Memphis. Santosh’s research seeks to define new frontiers in the discipline of mobile health (mHealth). His decade-long work has involved collecting mobile sensor data from over 100 human volunteers for 25,000+ hours in their natural environments as part of various scientific user studies. His collaborative research involves more than twenty faculty members from fifteen institutions, spanning a variety of disciplines, making his projects highly transdisciplinary. Visit Google Scholar page

Benjamin Marlin, Ph.D.

Co-I, TR&D1, TR&D2

Benjamin Marlin

U Mass Amherst

Dr. Benjamin Marlin joined the College of Information and Computer Sciences at the University of Massachusetts Amherst in 2011. There, he co-directs the Machine Learning for Data Science lab. His current research centers on the development of customized probabilistic models and algorithms for time series with applications to the analysis of electronic health records and mobile health data. His recent work includes probabilistic models for analyzing wireless ECG data, detection of cocaine use from wireless ECG, hierarchical activity recognition from on-body sensor data with applications to smoking and eating detection, and methods for mitigating lab-to-field generalization loss in mobile health studies. Marlin is a 2014 NSF CAREER award recipient and a 2013 Yahoo! Faculty Research Engagement Program award recipient. His research has also been supported by the National Institutes of Health, the Patient-Centered Outcomes Research Institute, and the US Army Research Laboratory. Prior to joining UMass Amherst, Marlin was a fellow of the Pacific Institute for the Mathematical Sciences and the Killam Trusts at the University of British Columbia. He completed his PhD in machine learning in the Department of Computer Science at the University of Toronto. Visit Google Scholar page