The mDOT Center

Transforming health and wellness via temporally-precise mHealth interventions
mDOT@MD2K.org
901.678.1526
 

TR&D1: Discovery

mDOT Center > Research Projects > TR&D1: Discovery

Enabling the Discovery of Temporally-Precise Intervention Targets and Timing Triggers from mHealth Biomarkers via Uncertainty-Aware Modeling of Personalized Risk Dynamics

Heteroscedastic Temporal Variational Autoencoder for Irregularly Sampled Time Series
Authors:
Publication Venue:

International Conference on Learning Representations (ICLR)

Publication Date:

January 28, 2022

Keywords:

irregular sampling, uncertainty, imputation, interpolation, multivariate time series, missing data, variational autoencoder

Related Project:

In order to model and represent uncertainty in mHealth biomarkers to account for multifaceted uncertainty during momentary decision making in selecting, adapting, and delivering temporally-precise mHealth interventions.  In this period, we extended our previous deep learning approach, Multi-Time Attention Networks, to enable improved representation of output uncertainty.  Our new approach preserves the idea of learned temporal similarity functions and adds heteroskedastic output uncertainty.  The new framework is referred to as the Heteroskedastic Variational Autoencoder and models real-valued multivariate data.

Abstract:

Irregularly sampled time series commonly occur in several domains where they present a significant challenge to standard deep learning models. In this paper, we propose a new deep learning framework for probabilistic interpolation of irregularly sampled time series that we call the Heteroscedastic Temporal Variational Autoencoder (HeTVAE). HeTVAE includes a novel input layer to encode information about input observation sparsity, a temporal VAE architecture to propagate uncertainty due to input sparsity, and a heteroscedastic output layer to enable variable uncertainty in output interpolations. Our results show that the proposed architecture is better able to reflect variable uncertainty through time due to sparse and irregular sampling than a range of baseline and traditional models, as well as recently proposed deep latent variable models that use homoscedastic output layers.

TL;DR:

We present a new deep learning architecture for probabilistic interpolation of irregularly sampled time series.

BayesLDM: A Domain-Specific Language for Probabilistic Modeling of Longitudinal Data
Authors:
Publication Venue:

IEEE/ACM international conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE)

Publication Date:

September 12, 2022

Keywords:

Bayesian inference, probabilistic programming, time series, missing data, Bayesian imputation, mobile health

Related Projects:

We have developed a toolbox for the specification and estimation of mechanistic models in the dynamic bayesian network family.  This toolbox focuses on making it easier to specify probabilistic dynamical models for time series data and to perform Bayesian inference and imputation in the specified model given incomplete data as input.  The toolbox is referred to as BayesLDM.  We have been working with members of CP3, CP4, and TR&D2 to develop offline data analysis and simulation models using this toolbox.  We are also currently in discussions with members of CP4 to deploy the toolbox’s Bayesian imputation methods within a live controller optimization trial in the context of an adaptive walking intervention.

Abstract:

In this paper we present BayesLDM, a system for Bayesian longitudinal data modeling consisting of a high-level modeling language with specific features for modeling complex multivariate time series data coupled with a compiler that can produce optimized probabilistic program code for performing inference in the specified model. BayesLDM supports modeling of Bayesian network models with a specific focus on the efficient, declarative specification of dynamic Bayesian Networks (DBNs). The BayesLDM compiler combines a model specification with inspection of available data and outputs code for performing Bayesian inference for unknown model parameters while simultaneously handling missing data. These capabilities have the potential to significantly accelerate iterative modeling workflows in domains that involve the analysis of complex longitudinal data by abstracting away the process of producing computationally efficient probabilistic inference code. We describe the BayesLDM system components, evaluate the efficiency of representation and inference optimizations and provide an illustrative example of the application of the system to analyzing heterogeneous and partially observed mobile health data.

TL;DR:

We present a a toolbox for the specification and estimation of mechanistic models in the dynamic bayesian network family.

PulseImpute: A Novel Benchmark Task for Pulsative Physiological Signal Imputation
Authors:
Publication Venue:

Neural Information Processing Systems (NeurIPS), Track on Datasets and Benchmarks

Publication Date:

September 16, 2022

Keywords:

missingness, imputation, mHealth, sensors, time-series, self-attention, pulsative, physiological, dataset

We developed a state-of-the-art attention-based deep learning transformer architecture that can learn to leverage the quasi-periodic signal structure to perform accurate imputation in the face of substantial amounts of missingness, such as the absence of multiple beats.  We have validated that this novel transformer-based imputation method outperforms existing standard imputation baselines.

Abstract:

The promise of Mobile Health (mHealth) is the ability to use wearable sensors to monitor participant physiology at high frequencies during daily life to enable temporally-precise health interventions. However, a major challenge is frequent missing data. Despite a rich imputation literature, existing techniques are ineffective for the pulsative signals which comprise many mHealth applications, and a lack of available datasets has stymied progress. We address this gap with PulseImpute, the first large-scale pulsative signal imputation challenge which includes realistic mHealth missingness models, an extensive set of baselines, and clinically-relevant downstream tasks. Our baseline models include a novel transformer-based architecture designed to exploit the structure of pulsative signals. We hope that PulseImpute will enable the ML community to tackle this important and challenging task.

TL;DR:

PulseImpute is the first mHealth pulsative signal imputation challenge which includes realistic missingness models, clinical downstream tasks, and an extensive set of baselines, including an augmented transformer that achieves SOTA performance.

mRisk: Continuous Risk Estimation for Smoking Lapse from Noisy Sensor Data with Incomplete and Positive-Only Labels
Authors:
Publication Venue:

ACM on Interactive, Mobile, Wearable, and Ubiquitous Technologies (IMWUT)

Publication Date:

September 7, 2022

Keywords:

behavioral intervention, human-centered computing, risk prediction, smoking cessation, ubiquitous and mobile computing design and evaluation methods, wearable sensors

Related Projects:
Estimation of the continuous risk state may be critical for delivering temporally-precise interventions and treatment adaptations in cessation programs. Continuous sensor data collected from wearables and smartphones to capture risk factors of adverse behaviors in the natural environment are usually noisy and incomplete. For adverse behavioral events such as a smoking lapse, capturing the precise timing of each smoking lapse may not be feasible, as sensors may not be worn at the time of a lapse or the lapse events may not be accurately detected due to the imperfection of machine learning models that are used to detect smoking events via hand-to-mouth gestures.  Therefore, only a few positive events (i.e., smoking lapse in a cessation attempt) are available. Confirmed negative labels can be assigned to a block of sensor data corresponding to a prediction window only if the entire time period is confirmed to have no high-risk moment.  As not all high-risk moments may result in a lapse, labeling a block of sensor data to the negative class is difficult for such events.  We addressed each of these challenges in developing the mRisk model.  Specifically, we encoded sensor data as events to handle noise and missingness, modeled the historical influence of recent psychological, behavioral, and environmental events via deep learning model and addressed the issue of lack of negative labels and only a small subset of positive labels by using a positive-unlabeled framework with a novel loss function.
Abstract:

Passive detection of risk factors (that may influence unhealthy or adverse behaviors) via wearable and mobile sensors has created new opportunities to improve the effectiveness of behavioral interventions. A key goal is to find opportune moments for intervention by passively detecting rising risk of an imminent adverse behavior. But, it has been difficult due to substantial noise in the data collected by sensors in the natural environment and a lack of reliable label assignment of low- and high-risk states to the continuous stream of sensor data. In this paper, we propose an event-based encoding of sensor data to reduce the effect of noises and then present an approach to efficiently model the historical influence of recent and past sensor-derived contexts on the likelihood of an adverse behavior. Next, to circumvent the lack of any confirmed negative labels (i.e., time periods with no high-risk moment), and only a few positive labels (i.e., detected adverse behavior), we propose a new loss function. We use 1,012 days of sensor and self-report data collected from 92 participants in a smoking cessation field study to train deep learning models to produce a continuous risk estimate for the likelihood of an impending smoking lapse. The risk dynamics produced by the model show that risk peaks an average of 44 minutes before a lapse. Simulations on field study data show that using our model can create intervention opportunities for 85% of lapses with 5.5 interventions per day.

TL;DR:

We present a model for identifying ideal moments for intervention by passively detecting risk of an imminent adverse behavior.

Kernel Multimodal Continuous Attention
Authors:
Publication Venue:

Neural Information Processing Systems (NeurIPS)

Publication Date:

October 31, 2022

Keywords:

attention, continuous attention, kernel methods

One technical challenge in modeling missingness in biomarker streams is the need to develop flexible attention mechanisms that can learn to focus on the relevant aspects of an input signal.  We have completed the development of a novel continuous-time attention model which is capable of learning multimodal densities, meaning that the attention density can be focused on multiple signal regions simultaneously.  Classical solutions like Gaussian mixtures have dense support, with the result that all regions of a signal have some probability mass, making it difficult to focus the attention on key regions and ignore irrelevant ones.  Our work introduces kernel deformed exponential families, a sparse class of multimodal attention densities.

We theoretically analysed the normalization, approximation, and numerical integration properties of this density class.  We applied these densities in analyzing real-world time series data and showed that the densities often capture the most salient aspects of an input signal, and outperform baseline density models on a diverse set of tasks.

Abstract:

Attention mechanisms take an expectation of a data representation with respect to probability weights. Recently, (Martins et al. 2020, 2021) proposed continuous attention mechanisms, focusing on unimodal attention densities from the exponential and deformed exponential families: the latter has sparse support. (Farinhas et al 2021) extended this to to multimodality via Gaussian mixture attention densities. In this paper, we extend this to kernel exponential families (Canu and Smola 2006) and our new sparse counterpart, kernel deformed exponential families. Theoretically, we show new existence results for both kernel exponential and deformed exponential families, and that the deformed case has similar approximation capabilities to kernel exponential families. Lacking closed form expressions for the context vector, we use numerical integration: we show exponential convergence for both kernel exponential and deformed exponential families. Experiments show that kernel continuous attention often outperforms unimodal continuous attention, and the sparse variant tends to highlight peaks of time series.

TL;DR:

We extend continuous attention from unimodal (deformed) exponential families and Gaussian mixture models to kernel exponential families and a new kernel deformed sparse counterpart.

Heteroscedastic Temporal Variational Autoencoder for Irregularly Sampled Time Series

Heteroscedastic Temporal Variational Autoencoder For Irregularly Sampled Time Series
https://github.com/reml-lab/hetvae
11 forks.
31 stars.
3 open issues.

Recent commits:

Authors:
Publication Venue:

International Conference on Learning Representations (ICLR)

Publication Date:

January 28, 2022

License:
Languages:

Jupyter Notebook

Python

BayesLDM: A Domain-Specific Language for Probabilistic Modeling of Longitudinal Data
Authors:
Publication Venue:

IEEE/ACM international conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE)

Publication Date:

September 12, 2022

Language:
License:

Python

PulseImpute: A Novel Benchmark Task for Pulsative Physiological Signal Imputation
Authors:
Publication Venue:

Neural Information Processing Systems (NeurIPS)

Publication Date:

September 16, 2022

Language:
License:

Python

mRisk: Continuous Risk Estimation for Smoking Lapse from Noisy Sensor Data with Incomplete and Positive-Only Labels
Authors:
Publication Venue:

ACM on Interactive, Mobile, Wearable, and Ubiquitous Technologies (IMWUT)

Publication Date:

September 7, 2022

Languages:

Jupyter Notebook

Python

Kernel Multimodal Continuous Attention
Authors:
Publication Venue:

Neural Information Processing Systems (NeurIPS)

Publication Date:

October 31, 2022

Languages:

Jupyter Notebook

Python

Heteroscedastic Temporal Variational Autoencoder for Irregularly Sampled Time Series
HeTVAE is a deep learning framework for probabilistic interpolation of irregularly sampled or sparse time series data. HeTVAE has three associated datasets:

Real World Datasets:
Synthetic Dataset:
  • Synthetic Data Generation: We generate a synthetic dataset consisting of 2000 trajectories each consisting of 50 time points with values between 0 and 1. We fix 10 reference time points and draw values for each from a standard normal distribution. We then use an RBF kernel smoother with a fixed bandwidth of α = 120.0 to construct local interpolations over the 50 time points. The data generating process is shown below: We randomly sample 3 − 10 observations from each trajectory to simulate a sparse and irregularly sampled univariate time series.
Authors:
Publication Venue:

International Conference on Learning Representations (ICLR)

Publication Date:

January 28, 2022

License:
PulseImpute: A Novel Benchmark Task for Pulsative Physiological Signal Imputation

The novel PulseImpute dataset is the first large-scale dataset containing complex imputation tasks for pulsative biophysical signals.  State-of-the-art imputation methods from the time series literature are shown to exhibit poor performance on PulseImpute, demonstrating that the missingness patterns emerging in mHealth applications represent a unique and important class of imputation problems.  By releasing this dataset and a new state-of-the-art baseline algorithm, we hope to spur the ML community to begin addressing these challenging problems.

Authors:
Publication Venue:

Neural Information Processing Systems (NeurIPS)

Publication Date:

November 28, 2022

License:

The past decade has seen tremendous advances in the ability to compute a diverse array of mobile sensor-based biomarkers in order to passively estimate health states, activities, and associated contexts (e.g. physical activity, sleep, smoking, mood, craving, stress, and geospatial context). Researchers are now engaged in the conduct of both observational and interventional field studies of increasing complexity and length that leverage mHealth sensor and biomarker technologies combined with the collection of measures of disease progression and other outcomes. 

 

As a result of the expansion of the set of available mHealth biomarkers and the push toward long-term, real-world deployment of mHealth technologies, a new set of critical gaps has emerged that were previously obscured by the focus of the field on smaller-scale proof-of-concept studies and the investigation of single biomarkers in isolation.

Solutions for Missing Sensor & Biomarker Data

First, the issue of missing sensor and biomarker data in mHealth field studies has quickly become a critical problem that directly and significantly impacts many of our CPs. Issues including intermittent wireless dropouts, wearables and smartphones running out of battery power, participants forgetting to carry or wear devices, and participants exercising privacy controls can all contribute to complex patterns of missing data that significantly complicate data analysis and limit the effectiveness of sensor-informed mHealth interventions.

High-Quality, Compact, & Interprerable Feature Representations

Second, with increasing interest in the use of reinforcement learning methods to provide online adaptation of interventions for every individual, there is an urgent need for high-quality, compact and interpretable feature representations that can enable more effective learning under strict budgets on the number of interactions with patients.

Methods for Deriving High-Level Knowledge & Supporting Causal Hypothesis Generation

Finally, as in other areas that are leveraging machine learning methods to drive scientific discovery and support decision making, mHealth needs methods that can be used to derive high-level knowledge and support causal hypothesis generation based on complex, non-linear models fit to biomarker time series data.

Sayma Akther, PhD

Assistant Professor


Supriya Nagesh, PhD

Applied Scientist


Varol Burak Aydemir, PhD

Principal Algorithms Engineer


Satya Shukla, PhD

Senior Research Scientist


Soujanya Chatterjee, PhD

Applied Scientist II


Md Azim Ullah, PhD

Applied Scientist


Alexander Moreno, PhD

Machine Learning Scientist


  1. S.N. Shukla, B.M. Marlin. Heteroscedastic Temporal Variational Autoencoder For Irregularly Sampled Time Series. In Proceedings of the International Conference on Learning Representations. 2022.
  2. Tung, K., Torre, S.D., Mistiri, M.E., Braganca, R.B., Hekler, E.B., Pavel, M., Rivera, D.E., Klasnja, P., Spruijt-Metz, D., & Marlin, B.M. (2022). BayesLDM: A Domain-Specific Language for Probabilistic Modeling of Longitudinal Data. Accepted at IEEE/ACM international conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE) 2022.  ArXiv, abs/2209.05581.
  3. M. A. Xu, A. Moreno, S. Nagesh, V. B. Aydemir, D. W. Wetter, S. Kumar, and J. M. Rehg. PulseImpute: A Novel Benchmark Task for Pulsative Physiological Signal Imputation. Proceedings 36th Conference on Neural Information Processing Systems (NeurIPS), Track on Datasets and Benchmarks, 2022. Accepted for publication.  NIHMS1839168.
  4. Md Azim Ullah, Soujanya Chatterjee, Christopher P. Fagundes, Cho Lam, Inbal Nahum-Shani, James M. Rehg, David W. Wetter, and Santosh Kumar. 2022. mRisk: Continuous Risk Estimation for Smoking Lapse from Noisy Sensor Data with Incomplete and Positive-Only Labels. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 6, 3, Article 143 (September 2022), 29 pages.
  5. A. Moreno, Z. Wu, S. Nagesh, W. Dempsey, and J. M. Rehg. Kernel Multimodal Continuous Attention. Proceedings 36th Conference on Neural Information Processing Systems (NeurIPS), 2022. Accepted for publication.
  1. S. Kumar, “Challenges and Opportunities in Trustworthy AI for Health and Wellness” ACM SIGKDD Trustworthy AI Day, 08/15/22.
  2. S. Kumar, “Detecting and Characterizing Stress in Daily Life,” Keynote Speech at IEEE EMBC Workshop on Detection of Stress and Mental Health Using Wearable Sensors, 07/11/2022.
  3. S. Kumar, “Can Sharing Anonymous Wrist-worn Accelerometry Data Re-identify You,” EECS Department, University of California, Irvine, 06/03/2022.
  4. S. Kumar, “Can Sharing Anonymous Wrist-worn Accelerometry Data Re-identify You,” CSE Department, The Ohio State University, 04/29/2022.
  5. S. Kumar, “Persuasive AI to Improve Health and Wellness,” Indo-US Roundtable, 03/24/2022.
  6. S. Kumar, “Wearable AI for Designing, Optimizing, and Delivering Temporally-Precise mHealth Interventions,” mHealth Special Session at International Conference on Network, Systems, and Security (NSySs’21), 12/23/2021.
  7. S.N. Shukla. "Heteroscedastic Temporal Variational Autoencoder For Irregularly Sampled Time Series." International Conference on Learning Representations. 4/27/2022.

James Rehg, PhD

Deputy Center Director, TR&D1 Lead


Santosh Kumar, PhD

Lead PI, Center Director, TR&D1, TR&D2, TR&D3


Benjamin Marlin, PhD

Co-Investigator, TR&D1, TR&D2



Sameer Neupane

Doctoral Student


Maxwell Xu

Doctoral Student


Mithun Saha

Doctoral Student


Karine Karine

Doctoral Student


Hui Wei

Doctoral Student