Seminars


Seminars are held on Thursdays at 4:00 pm in room 203 TMCB.
 

2019-12-05 - Derek Tucker - Elastic Functional Data Analysis

Presenter:

Derek Tucker

Title:

Elastic Functional Data Analysis

Affiliation:

Sandia National Labs

Date:

2019-12-05

Abstract:

Functional data analysis (FDA) is an important research area, due to its broad applications across many disciplines where functional data is prevalent. An essential component in solving these problems is the registration of points across functional objects. Without proper registration, the results are often inferior and difficult to interpret. The current practice in the FDA literature is to treat registration as a pre-processing step, using off-the-shelf alignment procedures, and follow it up with statistical analysis of the resulting data. In contrast, an Elastic framework is a more comprehensive approach, where one solves for the registration and statistical inferences in a simultaneous fashion. Our goal is to use a metric with appropriate invariance properties, to form objective functions for alignment and to develop statistical models involving functional data. While these elastic metrics are complicated in general, we have developed a family of square-root transformations that map these metrics into simpler Euclidean metrics, thus enabling more standard statistical procedures. Specifically, we have developed techniques for elastic functional PCA, elastic tolerance bounds, and elastic regression models involving functional variables. I will demonstrate these ideas using simulated data and real data from various sources.

J. Derek Tucker is a Principal Member of the Technical Staff at Sandia National Laboratories. He received his B.S. in Electrical Engineering Cum Laude and M.S. in Electrical Engineering from Colorado State University in 2007 and 2009, respectively. In 2014 he received a Ph.D. degree in Statistics from Florida State University In Tallahassee, FL under the co-advisement of Dr. Anuj Srivastava and Dr. Wei Wu. He currently is leading research projects in the area of satellite image registration and point processes modeling for monitory applications. His research is focused on pattern theoretic approaches to problems in image analysis, computer vision, signal processing, and functional data analysis. In 2017, he received the Director of National Intelligence Team Award for his contributions to the Signal Location in Complex Environments (SLiCE) team.

Website:

Tucker

2019-11-21 - Antonio Villanueva - MODIFIED PSEUDO-LIKELIHOOD ESTIMATION FOR MARKOV RANDOM FIELDS ON LATTICE

Presenter:

Antonio Villanueva

Title:

MODIFIED PSEUDO-LIKELIHOOD ESTIMATION FOR MARKOV RANDOM FIELDS ON LATTICE

Affiliation:

Statistics Department, Chapingo Autonomous University

Date:

Abstract:

The probability function of spatial statistical models involves, in general, an extremely awkward normalizing function of the parameters known as the partition function in statistical mechanics with the consequence that a direct approach to statistical inference through maximum likelihood (ML) is rarely possible. In order to avoid such intractability Besag (1975) introduced an alternative technique known as the method of maximum pseudo-likelihood (MPL) owing to its merit of being easy to implement. The maximum pseudo-likelihood estimator (MPLE) is the value of the parameter that maximizes the pseudo-likelihood defined as the direct product of conditional probabilities or conditional probability densities of the variable at each site. It has been mathematically demonstrated that, under suitable conditions, the MPLEs are strongly consistent and asymptotically normally distributed around the true parameter value for large samples of various spatial processes. On the other hand, the MPL method trades away efficiency for computational ease. It has been shown that in many situations the MPLE is not efficient in comparison with the ML estimator (MLE). According to these studies, the MPLEs are as good as the MLEs in the weak interaction case, but the difference between the two becomes substantial when spatial interactions are strong.

Huang and Ogata (2002) address the problem of improving the efficiency of MPLEs while still keeping the technique computationally feasible and proposed the maximum generalized pseudo-likelihood (MGPL) method for Markov random field (MRF) models on lattice. The MGPL estimator (MGPLE) is the value of the parameter that maximizes the generalized pseudo-likelihood function (GPL). This GPL is the multivariate version of Besag's pseudo-likelihood which is constructed first by defining a group of adjacent sites for each site in the lattice and then taking the product of the multivariate conditional probability distributions (MCPD) of the groups of random variables defined on each group of adjacent sites. Simulation results for an Ising and two auto-normal models on a region of square lattice showed better performance of the MGPLE than the MPLE, and the performance became better as the size of the groups of adjacent sites increased. On the other hand, it was observed that as the size of the groups of adjacent sites increased, the computing complexity for the MGPLE increased exponentially due to the presence of a normalizing integral (a sum in the case of discrete site variables) in the expression for each MCPD which has to be evaluated all over the support of the joint distribution for groups of site variables in each case. Because of this, for continuous MRFs other than auto-normal and discrete MRFs with site variables assuming more than two values, an enormous effort might be required making the implementation of the MGPL method practically unfeasible even for small square lattices. For example, in MRFs where each site variable, conditional on its neighbors, follows the distribution of a Winsorized Poisson random variable (Kaiser and Cressie (1997)) the computation of the normalizing integrals rapidly becomes prohibitive with the size of the groups of adjacent sites even for small square lattices, as the support of this distribution may be in the hundreds (or thousands).

In our research we propose a conditional pairwise pseudo-likelihood (CPPL) for parameter estimation in Markov random fields on lattice. The CPPL is defined as the direct product of conditional pairwise distributions corresponding to the pairs of random variables associated with the cliques of size two from the collection of spatial locations on a region of a lattice. Thus the CPPL is a modified version of Besag's pseudo-likelihood (PL) and Huang and Ogata's generalized pseudo-likelihood (GPL) in that it is not constructed based on defining a group of adjacent sites for each site in the lattice. We carry out calculations of the correspondingly defined maximum conditional pairwise pseudo-likelihood estimator (MCPPLE) for Markov random fields with Winsorized Poisson conditional distributions on the lattice. These simulation studies show that the MCPPLE has significantly better performance than Besag's maximum pseudo-likelihood estimator (MPLE), and its calculation is almost as easy to implement as the MPLE. Therefore, we suggest that for situations where each discrete local random variable conditional on its neighbors assumes more than two possible values, as in the Winsorized Poisson case, estimation based on the CPPL may be a computationally more feasible alternative than estimation based on Huang and Ogata's GPL.

Website:

2019-11-14 - Jennifer Sinnott - Genetic Association Testing with Imperfect Phenotypes Derived From Electronic Health Records

Presenter:

Jennifer Sinnott

Title:

Genetic Association Testing with Imperfect Phenotypes Derived From Electronic Health Records

Affiliation:

Ohio State University/University of Utah

Date:

2019-11-14

Abstract:

Electronic health records linked to blood samples form a powerful new data resource that can provide much larger, more diverse samples for testing associations between genetic markers and disease. However, algorithms for estimating certain phenotypes, especially those that are complex and/or difficult to diagnose, produce outcomes subject to measurement error. Much work is needed to determine best practices for implementing and analyzing such data. To this end, we recently proposed a method for analyzing case-control studies when disease status is estimated by a phenotyping algorithm; our method improves power and eliminates bias when compared to the standard approach of dichotomizing the algorithm prediction and analyzing the data as though case-control status were known perfectly. The method relies on knowing certain qualities of the algorithm, such as its sensitivity, specificity, and positive predictive value, but in practice these may not be known if no ``gold-standard'' phenotypes are known in the population. A common setting where that occurs is in phenome-wide association studies (PheWASs), in which a wide range of phenotypes are of interest, and all that is available for each phenotype is a surrogate measure, such as the number of billing codes for that disease. We proposed a method to perform genetic association tests in this setting, which improves power over existing methods that typically identify cases based on thresholding the number of billing codes. In this talk, I will describe these methods, and present applications to studies of rheumatoid arthritis in the Partners Healthcare System.

Website:

2019-11-05 - Jacob Mortensen - Statistical Methods for Modeling Movement

Presenter:

Jacob Mortensen

Title:

Statistical Methods for Modeling Movement

Affiliation:

Simon Fraser University

Date:

Abstract:

In recent years, tracking data in has become widespread, allowing researchers to model movement at a very high level of detail. In this talk I will present two examples of statistical research inspired by this type of data. In the first, I present a method for nonparametric estimation of continuous-state Markov transition densities. Our approach uses a Poisson point process to represent the joint transition space, then divides that process by the marginal intensity to estimate the conditional transition density. Modeling a transition density as a point process creates a general framework that admits a wide variety of implementations, depending on suitability for a given application and at the discretion of the modeler. A key feature of this point process representation is that it allows the presence of spatial structure to inform transition density estimation. We illustrate this by using our method to model ball movement in the National Basketball Association, enabling us to capture the effects of spatial features, such as the three point line, that impact transition density values. In the second, I will show how broadcast-derived tracking data can be used to estimate external load metrics in sports science. Sports scientists use high resolution coordinate data to estimate external load metrics, such as acceleration load and high speed running distance, traditionally used to understand the physical toll a game takes on an athlete. Unfortunately, collecting this data requires installation of expensive hardware and paying costly licensing fees to data providers, restricting its availability. Algorithms have been developed that allow a traditional broadcast feed to be converted to x-y coordinate data, making tracking data easier to acquire, but coordinates are available for an athlete only when that player is within the camera frame. Obviously, this leads to inaccuracies in external load estimates, limiting the usefulness of this data for sports scientists. In this research, we develop models that predict offscreen load metrics and demonstrate the viability of broadcast-derived tracking data for understanding external load in soccer.

Website:

2019-10-31- Nathan Sandholtz - Modeling human decision-making in spatio-temporal systems: An observational and an experimental case study

Presenter:

Nathan Sandholtz

Title:

Modeling human decision-making in spatio-temporal systems: An observational and an experimental case study

Affiliation:

Simon Fraser University

Date:

2019-10-31

Abstract:

In this talk I present two contrasting analyses of human decision-making behavior in
spatio-temporal systems. In the first case, we examine player shooting decisions in
professional basketball. We assume that all players operate under the same objective
function on offense---namely, maximizing their team's total expected points. Our goal is to
identify areas where, conditional on location, lineups exhibit potential inefficiencies in
allocating shots among their players. We do this by comparing a player's probability of
making a shot to the rate at which he shoots in context of both his four teammates on the
court and the spatial distribution of his shots. While on average players are highly efficient
with respect to the shot allocation metrics we introduce, nearly every lineup exhibits some
degree of potential inefficiency. We estimate and visualize the points that are potentially
lost and identify which players are responsible.

In the second case, we analyze an experiment in which subjects were tasked with
maximizing a reward in a simple "hotspot" computer game. As in the basketball example,
subjects made decisions to maximize a point total. However, unlike shots in a basketball
game, this task was specifically designed to induce uncertainty about the effect an action
has on the subsequent point outcome. This forced subjects to balance exploration and
exploitation in their strategy. Our analysis shows that subjects exhibit vastly different
preferences regarding the exploration vs. exploitation tradeoff. For this reason, we cannot
assume a global strategy which all subjects follow. On the contrary, inferring each subject's
latent strategy (or acquisition function, as referred to in the Bayesian optimization
literature) actually becomes the primary goal of our research. We find that the classical
suite of acquisition functions doesn’t adequately explain every subject's behavior, and we
propose a modification to this suite of acquisition functions which better explains the
subjects' collective set of decisions.

Website:

2019-10-24-Alex Petersen-Partial Separability and Graphical Models for Multivariate Functional Data

Presenter:

Alex Petersen

Title:

Partial Separability and Graphical Models for Multivariate Functional Data

Affiliation:

University of California, Santa Barbara

Date:

2019-10-24

Abstract:

Graphical models are a ubiquitous tool for identifying dependencies among components of high-dimensional multivariate data. Recently, these tools have been extended to estimate dependencies between components of multivariate functional data by applying multivariate methods to the coefficients of truncated basis expansions. A key difficulty compared to multivariate data is that the covariance operator is compact, and thus not invertible. In this talk, we will discuss a property called partial separability that circumvents the invertibility issue and identifies the functional graphical model with a countable collection of finite-dimensional graphical models. This representation allows for the development of simple and intuitive estimators. Finally, we will demonstrate the empirical findings of our method through simulation and analysis of functional brain connectivity during a motor task.

Website:

2019-10-17 Jonathan Blake “A Risk Manager’s Guide to a Faith Journey”

Presenter:

Jonathan Blake

Title:

A Risk Manager’s Guide to a Faith Journey

Affiliation:

Hanover

Date:

Abstract:

Jonathan Blake, recipient of the college Alumni Achievement Award, will give a guest lecture on Thursday, October 17. Entitled “A Risk Manager’s Guide to a Faith Journey,” the lecture will take place at 11 a.m. in room 1170 of the Talmage Building. The public is invited to attend the event. For over twenty years, Blake has been employed in a variety of actuarial roles. He is currently the Vice President and Lead Actuary at The Hanover Insurance Group. In this position, he assesses the financial strength of over three billion dollars in domestic reserve position. Blake is also a partner with the Personal, Commercial, and Specialty businesses units, where he helps units engage in profitable growth strategic initiatives. Blake graduated magna cum laude with a B.S. and M.S. from the Department of Statistics in the College of Physical and Mathematical Sciences. He has previously worked in Ohio, Texas, and Massachusetts and served as present, vice president, and board member of the Casualty Actuaries of New England. He is currently a Fellow of the Casualty Actuarial Society and a member of the American Academy of Actuaries. Blake and his wife, Julia, have six children. Each year, every college on Brigham Young University campus honors one alumnus or alumna with this prestigious award. Blake received the 2019 Alumni Achievement Award from the College of Physical and Mathematical Sciences.

Website:

2019-10-10 - Wes Johnson - Gold Standards are Out and Bayes is In: Implementing the Cure for Imperfect Reference Tests in Diagnostic Accuracy Studies

Presenter:

Wes Johnson

Title:

Gold Standards are Out and Bayes is In: Implementing the Cure for Imperfect Reference Tests in Diagnostic Accuracy Studies

Affiliation:

Department of Statistics, University of California Irvine

Date:

October 10th, 2019

Abstract:

Bayesian mixture models, often termed latent class models, allow users to estimate the diagnostic accuracy of tests and true prevalence in one or more populations when the positive and/or negative reference standards are imperfect. Moreover, they allow the data analyst to show the superiority of a novel test over an old test, even if this old test is the (imperfect) reference standard. We use published data on Toxoplasmosis in pigs to explore the effects of numbers of tests, numbers of populations, and dependence structure among tests to ensure model (local) identifiability. We discuss and make recommendations about the use of priors, sensitivity analysis, model identifiability and study design options, and strongly argue for the use of Bayesian mixture models as a logical and coherent approach for estimating the diagnostic accuracy of two or more tests.

Website:

https://www.ics.uci.edu/~wjohnson/

2019-10-03 - John Lawson - Useful Models and Design Strategies for Experimentation - A Career Long Perspective

Presenter:

John Lawson

Title:

Useful Models and Design Strategies for Experimentation - A Career Long Perspective

Affiliation:

Department of Statistics, Brigham Young University

Date:

October 3, 2019

Abstract:

Website:

John Lawson

2019-09-26 - Matt Heiner - Bayesian Nonparametric Density Autoregression with Lag Selection

Presenter:

Matt Heiner

Title:

Bayesian Nonparametric Density Autoregression with Lag Selection

Affiliation:

Department of Statistics, Brigham Young University

Date:

September 26, 2019

Abstract:

We propose and illustrate a Bayesian nonparametric autoregressive model applied to flexibly estimate general transition densities exhibiting nonlinear lag dependence. Our approach is related to Bayesian curve fitting via joint density estimation using Dirichlet process mixtures, with the Markovian likelihood defined as the conditional distribution obtained from the mixture. This results in a nonparametric extension of a mixture-of-experts formulation. We address computational challenges to posterior sampling that arise from the conditional likelihood. We illustrate the base model by fitting to synthetic data simulated from a classical model for population dynamics, as well as a time series of successive waiting times between eruptions of Old Faithful Geyser. We explore inferences available through the base model before extending the model to include automatic relevance detection among a pre-specified set of lags. We explore methods and inferences for global and local lag selection with additional simulation studies, and illustrate by fitting to an annual time series of pink salmon abundance in a stream in Alaska. We further explore and compare transition density estimation performance for alternative configurations of the proposed model.

Website:

2019-09-19 - Adam Smith - Bayesian Analysis of Partitioned and Large-Scale Demand Models

Presenter:

Adam Smith

Title:

Bayesian Analysis of Partitioned and Large-Scale Demand Models

Affiliation:

UCL School of Management, University College London

Date:

September 19, 2019

Abstract:

The analysis of consumer purchase behavior is a core component of marketing and economic research, but becomes challenging with large product assortments. I discuss two approaches for estimating demand models with a high-dimensional set of products. The first approach is based on partitioning demand: these models assume that products can be categorized into groups and then define consumer substitution patterns at the group-level rather than product-level. While this can significantly reduce the dimension of the parameter space, it can also lead to inaccurate inferences if the product categories do not match the structure of consumer preferences. To overcome this problem, I let the partition be a model parameter and propose a Bayesian method for inference. The second approach is based on regularization: I propose a new class of shrinkage priors for price elasticities in high-dimensional demand models. The prior has a hierarchical structure where the direction and rate of shrinkage depend on the information in a product classification tree. Both approaches are illustrated with store-level scanner data and the effects on demand predictions and product competition are discussed

Website:

Adam Smith

2019-04-04 - Daniel Apley - Understanding the Effects of Predictor Variables in Black-Box Supervised Learning Models

Presenter:

Daniel Apley

Title:

Understanding the Effects of Predictor Variables in Black-Box Supervised Learning Models

Affiliation:

Northwestern University

Date:

April 4, 2019

Abstract:

For many supervised learning applications, understanding and visualizing the effects of the predictor variables on the predicted response is of paramount importance. A shortcoming of black-box supervised learning models (e.g., complex trees, neural networks, boosted trees, random forests, nearest neighbors, local kernel-weighted methods, support vector regression, etc.) in this regard is their lack of interpretability or transparency. Partial dependence (PD) plots, which are the most popular general approach for visualizing the effects of the predictors with black box supervised learning models, can produce erroneous results if the predictors are strongly correlated, because they require extrapolation of the response at predictor values that are far outside the multivariate envelope of the training data. Functional ANOVA for correlated inputs can avoid this extrapolation but involves prohibitive computational expense and subjective choice of additive surrogate model to fit to the supervised learning model. We present a new visualization approach that we term accumulated local effects (ALE) plots, which have a number of advantages over existing methods. First, ALE plots do not require unreliable extrapolation with correlated predictors. Second, they are orders of magnitude less computationally expensive than PD plots, and many orders of magnitude less expensive than functional ANOVA. Third, they yield convenient variable importance/sensitivity measures that possess a number of desirable properties for quantifying the impact of each predictor.

Website:

Dr. Apley's Website

2019-03-28 - Jeff Miller - Flexible perturbation models for robustness to misspecification

Presenter:

Dr. Jeff Miller

Title:

Flexible perturbation models for robustness to misspecification

Affiliation:

Harvard

Date:

March 28, 2019

Abstract:

In many applications, there are natural statistical models with interpretable parameters that provide insight into questions of interest. While useful, these models are almost always wrong in the sense that they only approximate the true data generating process. In some cases, it is important to account for this model error when quantifying uncertainty in the parameters. We propose to model the distribution of the observed data as a perturbation of an idealized model of interest by using a nonparametric mixture model in which the base distribution is the idealized model. This provides robustness to small departures from the idealized model and, further, enables uncertainty quantification regarding the model error itself. Inference can easily be performed using existing methods for the idealized model in combination with standard methods for mixture models. Remarkably, inference can be even more computationally efficient than in the idealized model alone, because similar points are grouped into clusters that are treated as individual points from the idealized model. We demonstrate with simulations and an application to flow cytometry.

Website:

Dr. Miller's Website

2019-03-21 - Yue Zhang - Multi-state Approach for Studying Cancer Care Continuum using EHR data

Presenter:

Dr. Yue Zhang

Title:

Multi-state Approach for Studying Cancer Care Continuum using EHR data

Affiliation:

University of Utah

Date:

March 21, 2019

Abstract:

Diagnostic evaluation of suspected breast cancer due to abnormal screening mammography results is common, creates anxiety for women and is costly for the healthcare system. Timely evaluation with minimal use of additional diagnostic testing is key to minimizing anxiety and cost. In this paper, we propose a Bayesian semi-Markov model that allows for flexible, semi-parametric specification of the sojourn time distributions and apply our model to an investigation of the process of diagnostic evaluation with mammography, ultrasound and biopsy following an abnormal screening mammogram. We also investigate risk factors associated with the sojourn time between diagnostic tests. By utilizing semi-Markov processes, we expand on prior work that described the timing of the first test received by providing additional information such as the mean time to resolution and proportion of women with unresolved mammograms after 90 days for women requiring different sequences of tests in order to reach a definitive diagnosis. Overall, we found that older women were more likely to have unresolved positive mammograms after 90 days. Differences in the timing of imaging evaluation and biopsy were generally on the order of days and thus did not represent clinically important differences in diagnostic delay.

Website:

Dr. Zhang's Webpage

2019-03-14 - Dennis Tolley - DATA: Whence it Came…Where it’s Going

Presenter:

Dr. Dennis Tolley

Title:

DATA: Whence it Came…Where it’s Going

Affiliation:

BYU

Date:

March 14, 2019

Abstract:

A defining activity of statisticians is the handling, processing, analyzing and interpreting of data. With “big data” upon us, it is sometimes easy to forget some basic principles in the use of data. In this seminar I review some basic guidelines regarding data that apply before one actually begins to physically process the data files. I also review some guidelines based on the ultimate use of the results that assist in how a statistician will formulate a methodology and carry out the analysis. Application of these guidelines is illustrated with a simple problem in liquid chromatography that gives rise to a family of random walk models. These models, in turn, lay the foundation for a family of research problems in statistics.

Website:

Dr. Tolley's Website

2019-03-07 - Grant Schultz - Utah Crash Prediction Models: A Joint Effort for Success

Presenter:

Dr. Grant Schultz

Title:

Utah Crash Prediction Models: A Joint Effort for Success

Affiliation:

BYU

Date:

March 7, 2019

Abstract:

The Utah Department of Transportation (UDOT) continues to advance the safety of the state roadway network through their participation and endorsement of the “Zero Fatalities: A Goal We Can All Live With™” campaign to increase awareness of the importance of highway safety. As a continuing effort by UDOT to advance the safety of their roadway network, research has been conducted wherein statistical models have been developed that allow users to evaluate the safety of roadways within the state. Three models have developed by a team of Civil and Environmental Engineering and Statistics faculty and students. These models include the Utah Crash Prediction Model (UCPM), the Utah Crash Severity Model (UCSM), and the Utah Intersection Crash Prediction Model (UICPM). Using the output from these models, UDOT Safety Programs engineers, Region directors, and other interested users have access to data that will allow them to make informed decisions related to prioritizing highway safety projects and programs within the state of Utah.

Website:

Dr. Schultz Webpage

Pages