Biostat 245 - UCLA / Winter 2021
Instructor: Donatello Telesca
Webinar: W 3:30 - 4:50 [Zoom]
[1/13] Alexander Petersen (BYU and UCSB Statistics)
Title: Partial Separability and Functional Graphical Models
Abstract: The covariance structure of multivariate functional data can be highly complex, especially if the multivariate dimension is large, making extension of statistical methods for standard multivariate data to the functional data setting quite challenging. For example, Gaussian graphical models have recently been extended to the setting of multivariate functional data by applying multivariate methods to the coefficients of truncated basis expansions. However, a key difficulty compared to multivariate data is that the covariance operator is compact, and thus not invertible. The methodology in this paper addresses the general problem of covariance modeling for multivariate functional data, and functional Gaussian graphical models in particular. As a first step, a new notion of separability for multivariate functional data is proposed, termed partial separability, leading to a novel Karhunen-Loève-type expansion for such data. Next, the partial separability structure is shown to be particularly useful in order to provide a well-defined Gaussian graphical model that can be identified with a sequence of finite-dimensional graphical models, each of fixed dimension. This motivates a simple and efficient estimation procedure through application of the joint graphical lasso. Empirical performance of the method for graphical model estimation is assessed through simulation and analysis of functional brain connectivity during a motor task. [Lecture Recording]
[1/20] Guido Montufar (UCLA Mathematics)
Title: Implicit bias of gradient descent for mean squared error regression with wide neural networks
Abstract: We investigate gradient descent training of wide neural networks and the corresponding implicit bias in function space. For 1D regression, we show that the solution of training a width-$n$ shallow ReLU network is within $n^{- 1/2}$ of the function which fits the training data and whose difference from initialization has smallest 2-norm of the second derivative weighted by $1/\zeta$. The curvature penalty function $1/\zeta$ is expressed in terms of the probability distribution that is utilized to initialize the network parameters, and we compute it explicitly for various common initialization procedures. For instance, asymmetric initialization with a uniform distribution yields a constant curvature penalty, and thence the solution function is the natural cubic spline interpolation of the training data. While similar results have been obtained in previous works, our analysis clarifies important details and allows us to obtain significant generalizations. In particular, the result generalizes to multivariate regression and different activation functions. Moreover, we show that the training trajectories are captured by trajectories of spatially adaptive smoothing splines with decreasing regularization strength. This is joint work with Hui Jin. [Lecture Recording]
[2/03] Michele Peruzzi (Duke Statistics)
Title: Meshed Gaussian Processes for efficient Bayesian inference of big data spatial regression models
Abstract: Big spatial data are now routinely collected in massive amounts in diverse scientific and data-driven industrial applications including, but not limited to, natural and environmental sciences; economics; climate science; ecology; forestry; and public health. In this talk, I will introduce Meshed Gaussian Processes (MGPs) for scalable Bayesian regression modeling of spatial Big Data. The underlying idea combines concepts on high-dimensional geostatistics by partitioning the spatial domain and modeling the regions in the partition using a sparsity-inducing directed acyclic graph (DAG). Unlike other methods, MGPs consider the DAG as an explicit design choice -- rather than building the DAG based on some criterion (e.g. limiting conditional dependence to the m nearest neighbors), one chooses a DAG because of its known properties. The DAG is linked to groups of spatial locations, arising e.g. from domain tiling, tessellations, or other partitioning strategies. In particular, one may consider two particularly convenient DAGs and the corresponding domain partitioning strategies: (1) a recursive tree, (2) a "cubic" mesh. I will focus on the latter and show that the resulting "cubic" MGP (QMGP) corresponds to efficient parallel MCMC sampling of the latent spatial process, even with spatiotemporal data at more than ten million locations. I will then mention refinements, improvements and extensions of MGPs and QMGPs in particular:
(1) MCMC for QMGPs may exhibit slow convergence for irregularly spaced data and/or in estimating the covariance parameters a posteriori. I will resolve these issues by showing that a Grid-Parametrize-Split (GriPS) strategy results in massively more efficient MCMC.
(2) Why MCMC though? In some scenarios, it may be possible to fix some covariance parameters at some reasonable value; then, MCMC may be avoided. I will outline the possible computational advantages of QMGPs in these settings, compared to existing alternatives.
(3) The idea of fixing the DAG allows one to devise tailor-made MCMC algorithms for sampling specific MGPs. As a result, MGPs may facilitate computations for more general regression models on (multivariate) non-Gaussian outcomes.
[2/10] Abdelmonem Afifi (UCLA Biostatistics)
Title: COVID-19 Vaccines: What We Know and What We Don’t
Abstract: The emergence of vaccines in late 2020 has shone a bright light at the end of the long and dark COVID-19 tunnel. In this seminar, I summarize what I wanted to know about these vaccines. I begin by describing the different types of vaccines that have appeared or are under development. I describe the FDA approval process, particularly as it relates to the Pfizer-BioNTech and Moderna vaccines. I discuss the process and potential consequences of vaccine distribution in the USA, including herd immunity and what it takes to reach it. I conclude by speculating on what is next for the course of the pandemic.
[2/17] Vladimir Minin (UCI Statistics)
Title: Using multiple data streams to estimate and forecast SARS-CoV-2 transmission dynamics
Abstract: Monitoring of transmission dynamics were critical to interrupting the initial spread of the novel coronavirus (SARS-CoV-2) and mitigating morbidity and mortality caused by the coronavirus disease (COVID-19). Formulating a regional mechanistic model of SARS-CoV-2 transmission dynamics and frequently estimating parameters of this model using streaming surveillance data offers one way to accomplish data-driven implementation of mitigation strategies. However, such parameter estimation can be imprecise, because surveillance data are noisy and not informative about all aspects of the mechanistic model, even for reasonably parsimonious epidemic models. To overcome this obstacle, at least partially, we propose a Bayesian modeling framework that integrates multiple surveillance data streams. Our model uses both COVID-19 incidence and mortality time series to estimate our model parameters. Importantly, our data generating model for incidence data takes into account changes in the total number of tests performed. We apply our Bayesian data integration method to COVID-19 surveillance data collected in Orange County, California and estimate changes in transmission dynamics during the course of the pandemic. [Lecture Recording]
[2/24] Brian Wells (UCLA Center for Health Policy Research)
Title: Evaluating the Redesign of the California Health Interview Survey
Abstract: The decline of telephone surveys due to low response rates and cultural shifts in phone usage has motivated many surveys to consider implementing major methodological changes to help with response and cost. Following an extensive evaluation and experimentation process, the California Health Interview Survey (CHIS) fully transitioned in 2019 to a mixed-mode survey design (mail push-to-web survey with a telephone nonresponse follow-up) using an address-based sampling frame. In addition to the base design, CHIS implemented a number of other sample and survey design changes to improve representation. These changes included rearranging the order of the Child and Adult surveys, using non-English focused mailings, and applying machine learning methods along with third-party data sources to help sample underrepresented populations through predictive modeling.
While improvements in response and cost are important, a concern for many CHIS data users was the impact the redesign would have on trending key outcomes. To help understand these changes, CHIS underwent an evaluation to try to determine whether shifting trends observed were due to actual changes in the population over time or whether it was related to methodological modifications. Two particular concerns were changes related to sample composition differences due to the sampling frame and mode, and measurement changes related to mode.
This presentation will explore where each of these methodological design changes helped improve CHIS data collection, and where additional refinements are needed. It will also examine key indicators across multiple domains (including health conditions, health behaviors, and health care) to determine if there is evidence to support a break in trend and where trends should be interpreted with caution.
[3/03] John Boscardin (UCSF Biostatistics) [This seminar is postponed to a new date (TBD)]