Seminars


Seminars are held on Thursdays at 4:00 pm in room 203 TMCB.
 

2019-11-21 - Antonio Villanueva

Presenter:

Antonio Villanueva

Title:

Affiliation:

Statistics Department, Chapingo Autonomous University

Date:

Abstract:

Website:

2019-11-14 - Jennifer Sinnott

Presenter:

Title:

Affiliation:

Ohio State University

Date:

Abstract:

Website:

2019-11-07 - Fernando Quintana

Presenter:

Title:

Affiliation:

Pontificia Universidad Católica de Chile

Date:

Abstract:

Website:

2019-10-10 - Wes Johnson

Presenter:

Wes Johnson

Title:

Affiliation:

Department of Statistics, University of California Irvine

Date:

Abstract:

Website:

2019-10-03 - John Lawson

Presenter:

John Lawson

Title:

Affiliation:

Department of Statistics, Brigham Young University

Date:

Abstract:

Website:

2019-09-26 - Matt Heiner

Presenter:

Matt Heiner

Title:

Affiliation:

Department of Statistics, Brigham Young University

Date:

Abstract:

Website:

2019-09-19 - Adam Smith - Bayesian Analysis of Partitioned and Large-Scale Demand Models

Presenter:

Adam Smith

Title:

Bayesian Analysis of Partitioned and Large-Scale Demand Models

Affiliation:

UCL School of Management, University College London

Date:

September 19, 2019

Abstract:

The analysis of consumer purchase behavior is a core component of marketing and economic research, but becomes challenging with large product assortments. I discuss two approaches for estimating demand models with a high-dimensional set of products. The first approach is based on partitioning demand: these models assume that products can be categorized into groups and then define consumer substitution patterns at the group-level rather than product-level. While this can significantly reduce the dimension of the parameter space, it can also lead to inaccurate inferences if the product categories do not match the structure of consumer preferences. To overcome this problem, I let the partition be a model parameter and propose a Bayesian method for inference. The second approach is based on regularization: I propose a new class of shrinkage priors for price elasticities in high-dimensional demand models. The prior has a hierarchical structure where the direction and rate of shrinkage depend on the information in a product classification tree. Both approaches are illustrated with store-level scanner data and the effects on demand predictions and product competition are discussed

Website:

Adam Smith

2019-04-04 - Daniel Apley - Understanding the Effects of Predictor Variables in Black-Box Supervised Learning Models

Presenter:

Daniel Apley

Title:

Understanding the Effects of Predictor Variables in Black-Box Supervised Learning Models

Affiliation:

Northwestern University

Date:

April 4, 2019

Abstract:

For many supervised learning applications, understanding and visualizing the effects of the predictor variables on the predicted response is of paramount importance. A shortcoming of black-box supervised learning models (e.g., complex trees, neural networks, boosted trees, random forests, nearest neighbors, local kernel-weighted methods, support vector regression, etc.) in this regard is their lack of interpretability or transparency. Partial dependence (PD) plots, which are the most popular general approach for visualizing the effects of the predictors with black box supervised learning models, can produce erroneous results if the predictors are strongly correlated, because they require extrapolation of the response at predictor values that are far outside the multivariate envelope of the training data. Functional ANOVA for correlated inputs can avoid this extrapolation but involves prohibitive computational expense and subjective choice of additive surrogate model to fit to the supervised learning model. We present a new visualization approach that we term accumulated local effects (ALE) plots, which have a number of advantages over existing methods. First, ALE plots do not require unreliable extrapolation with correlated predictors. Second, they are orders of magnitude less computationally expensive than PD plots, and many orders of magnitude less expensive than functional ANOVA. Third, they yield convenient variable importance/sensitivity measures that possess a number of desirable properties for quantifying the impact of each predictor.

Website:

Dr. Apley's Website

2019-03-28 - Jeff Miller - Flexible perturbation models for robustness to misspecification

Presenter:

Dr. Jeff Miller

Title:

Flexible perturbation models for robustness to misspecification

Affiliation:

Harvard

Date:

March 28, 2019

Abstract:

In many applications, there are natural statistical models with interpretable parameters that provide insight into questions of interest. While useful, these models are almost always wrong in the sense that they only approximate the true data generating process. In some cases, it is important to account for this model error when quantifying uncertainty in the parameters. We propose to model the distribution of the observed data as a perturbation of an idealized model of interest by using a nonparametric mixture model in which the base distribution is the idealized model. This provides robustness to small departures from the idealized model and, further, enables uncertainty quantification regarding the model error itself. Inference can easily be performed using existing methods for the idealized model in combination with standard methods for mixture models. Remarkably, inference can be even more computationally efficient than in the idealized model alone, because similar points are grouped into clusters that are treated as individual points from the idealized model. We demonstrate with simulations and an application to flow cytometry.

Website:

Dr. Miller's Website

2019-03-21 - Yue Zhang - Multi-state Approach for Studying Cancer Care Continuum using EHR data

Presenter:

Dr. Yue Zhang

Title:

Multi-state Approach for Studying Cancer Care Continuum using EHR data

Affiliation:

University of Utah

Date:

March 21, 2019

Abstract:

Diagnostic evaluation of suspected breast cancer due to abnormal screening mammography results is common, creates anxiety for women and is costly for the healthcare system. Timely evaluation with minimal use of additional diagnostic testing is key to minimizing anxiety and cost. In this paper, we propose a Bayesian semi-Markov model that allows for flexible, semi-parametric specification of the sojourn time distributions and apply our model to an investigation of the process of diagnostic evaluation with mammography, ultrasound and biopsy following an abnormal screening mammogram. We also investigate risk factors associated with the sojourn time between diagnostic tests. By utilizing semi-Markov processes, we expand on prior work that described the timing of the first test received by providing additional information such as the mean time to resolution and proportion of women with unresolved mammograms after 90 days for women requiring different sequences of tests in order to reach a definitive diagnosis. Overall, we found that older women were more likely to have unresolved positive mammograms after 90 days. Differences in the timing of imaging evaluation and biopsy were generally on the order of days and thus did not represent clinically important differences in diagnostic delay.

Website:

Dr. Zhang's Webpage

2019-03-14 - Dennis Tolley - DATA: Whence it Came…Where it’s Going

Presenter:

Dr. Dennis Tolley

Title:

DATA: Whence it Came…Where it’s Going

Affiliation:

BYU

Date:

March 14, 2019

Abstract:

A defining activity of statisticians is the handling, processing, analyzing and interpreting of data. With “big data” upon us, it is sometimes easy to forget some basic principles in the use of data. In this seminar I review some basic guidelines regarding data that apply before one actually begins to physically process the data files. I also review some guidelines based on the ultimate use of the results that assist in how a statistician will formulate a methodology and carry out the analysis. Application of these guidelines is illustrated with a simple problem in liquid chromatography that gives rise to a family of random walk models. These models, in turn, lay the foundation for a family of research problems in statistics.

Website:

Dr. Tolley's Website

2019-03-07 - Grant Schultz - Utah Crash Prediction Models: A Joint Effort for Success

Presenter:

Dr. Grant Schultz

Title:

Utah Crash Prediction Models: A Joint Effort for Success

Affiliation:

BYU

Date:

March 7, 2019

Abstract:

The Utah Department of Transportation (UDOT) continues to advance the safety of the state roadway network through their participation and endorsement of the “Zero Fatalities: A Goal We Can All Live With™” campaign to increase awareness of the importance of highway safety. As a continuing effort by UDOT to advance the safety of their roadway network, research has been conducted wherein statistical models have been developed that allow users to evaluate the safety of roadways within the state. Three models have developed by a team of Civil and Environmental Engineering and Statistics faculty and students. These models include the Utah Crash Prediction Model (UCPM), the Utah Crash Severity Model (UCSM), and the Utah Intersection Crash Prediction Model (UICPM). Using the output from these models, UDOT Safety Programs engineers, Region directors, and other interested users have access to data that will allow them to make informed decisions related to prioritizing highway safety projects and programs within the state of Utah.

Website:

Dr. Schultz Webpage

2019-02-28 - Ephraim Hanks - Random walk spatial models for spatially correlated genetic data

Presenter:

Dr. Ephraim Hanks

Title:

Random walk spatial models for spatially correlated genetic data

Affiliation:

Penn State

Date:

February 28, 2019

Abstract:

Landscape genetics is the study of how landscape features, like rivers, mountains, and roads, influence genetic connectivity of wildlife populations. We build models for spatial genetic correlation based off of spatio-temporal models for how animals move across the landscape. This approach provides insights into common spatial models, such as simultaneous autoregressive (SAR) models and common Matern covariance models. It also allows for scientific interpretation of spatial covariance parameters. We illustrate this approach in a study of brook trout, where we provide the first parametric description of how stream characteristics influence genetic connectivity.

Website:

Dr. Hanks' Website

2019-02-21 - Michele Guindani - Bayesian Approaches to Dynamic Model Selection

Presenter:

Michele Guindani

Title:

Bayesian Approaches to Dynamic Model Selection

Affiliation:

University of California, Irvine

Date:

February 21, 2019

Abstract:

In many applications, investigators monitor processes that vary in space and time, with the goal of identifying temporally persistent and spatially localized departures from a baseline or ``normal" behavior. In this talk, I will first discuss a principled Bayesian approach for estimating time varying functional connectivity networks from brain fMRI data. Dynamic functional connectivity, i.e., the study of how interactions among brain regions change dynamically over the course of an fMRI experiment, has recently received wide interest in the neuroimaging literature. Our method utilizes a hidden Markov model for classification of latent neurological states, achieving estimation of the connectivity networks in an integrated framework that borrows strength over the entire time course of the experiment. Furthermore, we assume that the graph structures, which define the connectivity states at each time point, are related within a super-graph, to encourage the selection of the same edges among related graphs. Then, I will propose a Bayesian nonparametric model selection approach with an application to the monitoring of pneumonia and influenza (P&I) mortality, to detect influenza outbreaks in the continental United States. More specifically, we introduce a zero-inflated conditionally identically distributed species sampling prior which allows borrowing information across time and to assign data to clusters associated to either a null or an alternate process. Spatial dependences are accounted for by means of a Markov random field prior, which allows to inform the selection based on inferences conducted at nearby locations. We show how the proposed modeling framework performs in an application to the P&I mortality data and in a simulation study, and compare with common threshold methods for detecting outbreaks over time, with more recent Markov switching based models, and with other Bayesian nonparametric priors that do not take into account spatio-temporal dependence.

Website:

Dr. Guidani's Website

2019-02-14 - Garritt Page - Temporal and Spatio-Temporal Random Partition Models

Presenter:

Dr. Garritt Page

Title:

Temporal and Spatio-Temporal Random Partition Models

Affiliation:

BYU

Date:

February 14, 2019

Abstract:

The number of scientific fields that regularly collect data that are temporally and spatially referenced continues to experience rapid growth. An intuitive feature in data that are spatio-temporal is that measurements taken on experimental units near each other in time and space tend to be similar. Because of this, many methods developed to accommodate spatio-temporal dependent structures perform a type of implicit grouping based on time and space. Rather than implicitly grouping observations through a type of smoothing, we develop a class of dependent random partition models that explicitly models spatio-temporal clustering. This model can be thought of as a joint distribution for a sequence of random partitions indexed by time and space. We first detail how temporal dependence is incorporated so that partitions evolve gently over time. Then a few properties of the joint model are derived and induced dependence at the observation level is explored. Afterwards, we demonstrate how space can be integrated. Computation strategies are detailed and we apply the method to Chilean standardized testing scores.

Website:

Dr. Page's Website

2019-02-07 - Gil Fellingham - Predicting Performance of Professional Golfers

Presenter:

Dr. Gil Fellingham

Title:

Predicting Performance of Professional Golfers

Affiliation:

BYU

Date:

February 7, 2019

Abstract:

Many statisticians agree that building models that predict well should be a high priority. (Harville, 2014, Stern, 2014, Berry and Berry, 2014). The purpose of this paper is to test the predictive ability of various Bayesian models using a group of closely matched members of the Professional Golf Association (PGA). Predicting performance of PGA golfers is a notoriously difficult task. We fit six different models to scores produced by 22 PGA golfers playing on 18 different golf courses in 2014. We then use these models to predict scores for the same golfers and golf courses as well as other golfers and other courses in 2015. We varied model complexity across two different dimensions. In one dimension we fit model intercepts using parametric Bayesian, nonparametric Bayesian, and hierarchical Bayesian methods. In the other dimension, we either included covariates for driving distance, greens hit in regulation, and difficulty of course as measured by slope, or we did not include the covariates. Preliminary results indicate that nonparametric Bayesian methods seem marginally better.

Website:

Dr. Fellingham's Webpage

Pages