Skip to main content

Seminars

Seminars are held on Thursdays at 4:00 pm in room 203 TMCB
  • Toggle Item
    2020-04-09 - Mike Lubkowski
  • Toggle Item
    2020-04-02 - Christopher Wikle
  • Toggle Item
    2020-03-26 - Sarah Schwartz
  • Toggle Item
    2020-03-19 - Marie Tuft
  • Toggle Item
    2020-03-12 - Julia Silge - RStudio
  • Toggle Item
    2020-03-05 - Tom Greene
  • Toggle Item
    2020-02-27 - David Dahl - Two for One: 1. Focal Random Partition Distribution and 2. Optimization of Clustering Criteria
    Abstract:
    Random partition models, such as the Chinese restaurant process, allow a Bayesian model to flexibly borrow strength. We present two related
    working papers on random partition models. First, while many partition
    priors are exchangeable, we propose a nonexchangeable prior based on a focal partition, a Bayesian's prior guess for the unknown partition. We
    show how our approach modifies the Chinese restaurant process so that
    partitions that are similar to the focal partition have higher
    probability. There is a weight parameter that varies between -1 and
    infinity, where 0 corresponds to the original Chinese restaurant process
    and infinity yields a point mass distribution at the focal partition. In
    the later part of the talk, we present a novel stochastic search algorithm
    to minimize the posterior expected loss of a clustering criterion based on
    a pairwise similarity matrix. Several loss functions for clustering have
    been proposed, but the minimization of the expected loss is challenging
    given the large size and discrete nature of the search space. Our
    approach is a stochastic search based on a series of micro-optimizations
    performed in a random order. Our approach is embarrassingly parallel.

    Co-Authors:
    Richard Warr
    Thomas Jensen
    Devin Johnson
    Peter Müller

    Website:
    https://statistics.byu.edu/directory/dahl-david-b
  • Toggle Item
    2020-02-20 - Paul Sabin - Estimating Player Value in Football Using Plus-Minus Models
    The use of statistical methods in sports has exploded during the past decade. Football, America’s most popular sport has lagged behind in the adoption of “Analytics.” A framework for calculating the expected points of each play was introduced by former BYU Quarterback and statistics student Virgil Carter and Robert Machol in 1971. 35 years later, this framework was reintroduced by Brian Burke and opened up the possibilities of analysis in the sport. Until more recently, calculating the value of football players on-field performance has been limited to scouting methods and quarterbacks. Adjusted Plus-Minus (APM) models have long been used in other sports, most notably basketball (Rosenbaum (2004), Kubatko et al. (2007), Winston (2009), Sill (2010)) to estimate each player’s value by accounting for those in the game at the same time. More recent methods have found ways to incorporate plus-minus models in other sports such as Hockey (Macdonald (2011)) and Soccer (Schultze and Wellbrock (2018) and Macdonald (2011)). These models are especially useful in coming up with results-oriented estimation of each player’s value. In American Football, it is difficult to estimate every player’s value since many positions such as offensive lineman have no recorded statistics. While player-tracking data in the NFL is allowing new analysis, such data does not exist in other levels of football such as the NCAA. Using expected points, I provide a model framework that solves many of the traditional issues APM models face in football. This methodology allows the models to estimate the value of each position in each level of the sport. These coarse models will be essential to pair with fine-level player tracking models in a multiscale framework in the future.
  • Toggle Item
    2020-02-13 - Greg MacFarlane -Using Mobile Device Data to Measure Park Choice, Access, and Health
    Parks provide benefits to the people who can access them, but previous research attempts to quantify these benefits have used largely arbitrary means to measure access. In two papers currently under review, I and my coauthors have applied a measurement of access based in econometric choice theory to this problem. We inform this measure using mobile device data, developing models of how far people are willing to travel to reach marginally larger parks. We then correlate these choice-based measures of access to parks with tract-level data on physical activity and obesity rates, controlling for demographic variables and spatial effects. The results indicate that excellent park access improves physical activity participation rates, and suggests a marginal improvement in obesity rates beyond what physical activity and demographics can explain.
  • Toggle Item
    2020-02-06 - Summer Rupper - Tapping into Spy Satellite Imagery to Measure Glacier Changes in the Water Towers of Asia
    Abstract:
    The high mountain regions of Asia are host to more snow and ice than anywhere outside of the Polar Regions. Changes in snow and ice storage in these remote landscapes has the potential to impact the nearly 1 billion people living downstream. While it is clear that glaciers are generally retreating (shrinking in size) globally, there is a significant paucity of data on glacier changes in high mountain Asia. These glacial systems are extremely remote, occur at very high altitudes, and are often located along disputed boarders in geopolitically unstable regions. This has hampered our ability to access these glaciers and directly monitor changes over time. Here we tap into historical spy satellite imagery to measure the change in glacier volume across the Himalayan range over the past ~50 years. We use this new data set to assess the magnitude and rates of change in glacier-related water resources, assess the mechanisms driving these changes, and model the impacts on downstream populations.
  • Toggle Item
    2020-01-30 - Abbas Zaidi - Evaluating the Effect of Residential Care on Self-Concept and Ego-Resilience: A Generalized Propensity Score Analysis with Clustered Data
    Abstract:
    This discussion focuses on the parametric estimation of average causal effects under a continuous treatment in a hierarchical setting. Our approach is applied to assessing the impact of the Udayan Ghar Program. This residential care system began in New Delhi, India with the purpose of providing surrogate housing and care to vulnerable and at risk children in an effort to improve their psychological development. We investigate the impact of staying in the system on the self-concept and ego-resilience of the residents as proxied by the Piers-Harris score. We find that there is a positive effect of staying in the residential care system at Udayan and that there are variations in this effect by gender. Furthermore, we strengthen our causal claims by demonstrating robustness against unmeasured confounding using a novel sensitivity analysis technique to assess how violations of this key identifying assumption impact our findings.

    Website:
    https://sites.google.com/site/amzaidistatistics/
  • Toggle Item
    2020-01-23 - Kevin Moon - Visualizing the True Structure of Big Data for Data Exploration
    Abstract:
    We live in an era of big data in which researchers in nearly every field are generating thousands or even millions of samples in high dimensions. Most methods in data science focus on prediction or impose restrictive assumptions that require established knowledge and understanding of the data; i.e. these methods require some level of expert supervision. However, in many cases, this knowledge is unavailable and the goal of data analysis is scientific discovery and to develop a better understanding of the data. There is especially a strong need for methods that perform unsupervised data visualization that accurately represents the true structure of the data, which is crucial for developing intuition and understanding of the data. In this talk, I will present PHATE: an unsupervised data visualization tool based on a new information distance that excels at denoising the data while preserving both global and local structure. I will demonstrate PHATE on a variety of datasets including facial images, mass cytometry data, and new single-cell RNA-sequencing data. On the latter, I will show how PHATE can be used to discover novel surface markers for sorting cell populations. In addition, I will present DIG, a visualization tool inspired by PHATE that theoretically eliminates nearly all sources of noise when visualizing dynamical systems. I will then demonstrate DIG on EEG sleep data.

    Website:
    https://sites.google.com/a/umich.edu/kevin-r-moon/home
  • Toggle Item
    2019-12-05 - Derek Tucker - Elastics Functional Data Analysis
    Abstract:
    Functional data analysis (FDA) is an important research area, due to its broad applications across many disciplines where functional data is prevalent. An essential component in solving these problems is the registration of points across functional objects. Without proper registration, the results are often inferior and difficult to interpret. The current practice in the FDA literature is to treat registration as a pre-processing step, using off-the-shelf alignment procedures, and follow it up with statistical analysis of the resulting data. In contrast, an Elastic framework is a more comprehensive approach, where one solves for the registration and statistical inferences in a simultaneous fashion. Our goal is to use a metric with appropriate invariance properties, to form objective functions for alignment and to develop statistical models involving functional data. While these elastic metrics are complicated in general, we have developed a family of square-root transformations that map these metrics into simpler Euclidean metrics, thus enabling more standard statistical procedures. Specifically, we have developed techniques for elastic functional PCA, elastic tolerance bounds, and elastic regression models involving functional variables. I will demonstrate these ideas using simulated data and real data from various sources.

    J. Derek Tucker is a Principal Member of the Technical Staff at Sandia National Laboratories. He received his B.S. in Electrical Engineering Cum Laude and M.S. in Electrical Engineering from Colorado State University in 2007 and 2009, respectively. In 2014 he received a Ph.D. degree in Statistics from Florida State University In Tallahassee, FL under the co-advisement of Dr. Anuj Srivastava and Dr. Wei Wu. He currently is leading research projects in the area of satellite image registration and point processes modeling for monitory applications. His research is focused on pattern theoretic approaches to problems in image analysis, computer vision, signal processing, and functional data analysis. In 2017, he received the Director of National Intelligence Team Award for his contributions to the Signal Location in Complex Environments (SLiCE) team.

    Affiliation:
    Sandia National Labs

    Date:
    2019-12-05
  • Toggle Item
    2019-11-21 - Antonio Villanueva-Morales - Modified Pseudo-likelihood Estimation for Markov Random Fields on Lattice
    Abstract:
    The probability function of spatial statistical models involves, in general, an extremely awkward normalizing function of the parameters known as the partition function in statistical mechanics with the consequence that a direct approach to statistical inference through maximum likelihood (ML) is rarely possible. In order to avoid such intractability Besag (1975) introduced an alternative technique known as the method of maximum pseudo-likelihood (MPL) owing to its merit of being easy to implement. The maximum pseudo-likelihood estimator (MPLE) is the value of the parameter that maximizes the pseudo-likelihood defined as the direct product of conditional probabilities or conditional probability densities of the variable at each site. It has been mathematically demonstrated that, under suitable conditions, the MPLEs are strongly consistent and asymptotically normally distributed around the true parameter value for large samples of various spatial processes. On the other hand, the MPL method trades away efficiency for computational ease. It has been shown that in many situations the MPLE is not efficient in comparison with the ML estimator (MLE). According to these studies, the MPLEs are as good as the MLEs in the weak interaction case, but the difference between the two becomes substantial when spatial interactions are strong.

    Huang and Ogata (2002) address the problem of improving the efficiency of MPLEs while still keeping the technique computationally feasible and proposed the maximum generalized pseudo-likelihood (MGPL) method for Markov random field (MRF) models on lattice. The MGPL estimator (MGPLE) is the value of the parameter that maximizes the generalized pseudo-likelihood function (GPL). This GPL is the multivariate version of Besag's pseudo-likelihood which is constructed first by defining a group of adjacent sites for each site in the lattice and then taking the product of the multivariate conditional probability distributions (MCPD) of the groups of random variables defined on each group of adjacent sites. Simulation results for an Ising and two auto-normal models on a region of square lattice showed better performance of the MGPLE than the MPLE, and the performance became better as the size of the groups of adjacent sites increased. On the other hand, it was observed that as the size of the groups of adjacent sites increased, the computing complexity for the MGPLE increased exponentially due to the presence of a normalizing integral (a sum in the case of discrete site variables) in the expression for each MCPD which has to be evaluated all over the support of the joint distribution for groups of site variables in each case. Because of this, for continuous MRFs other than auto-normal and discrete MRFs with site variables assuming more than two values, an enormous effort might be required making the implementation of the MGPL method practically unfeasible even for small square lattices. For example, in MRFs where each site variable, conditional on its neighbors, follows the distribution of a Winsorized Poisson random variable (Kaiser and Cressie (1997)) the computation of the normalizing integrals rapidly becomes prohibitive with the size of the groups of adjacent sites even for small square lattices, as the support of this distribution may be in the hundreds (or thousands).

    In our research we propose a conditional pairwise pseudo-likelihood (CPPL) for parameter estimation in Markov random fields on lattice. The CPPL is defined as the direct product of conditional pairwise distributions corresponding to the pairs of random variables associated with the cliques of size two from the collection of spatial locations on a region of a lattice. Thus the CPPL is a modified version of Besag's pseudo-likelihood (PL) and Huang and Ogata's generalized pseudo-likelihood (GPL) in that it is not constructed based on defining a group of adjacent sites for each site in the lattice. We carry out calculations of the correspondingly defined maximum conditional pairwise pseudo-likelihood estimator (MCPPLE) for Markov random fields with Winsorized Poisson conditional distributions on the lattice. These simulation studies show that the MCPPLE has significantly better performance than Besag's maximum pseudo-likelihood estimator (MPLE), and its calculation is almost as easy to implement as the MPLE. Therefore, we suggest that for situations where each discrete local random variable conditional on its neighbors assumes more than two possible values, as in the Winsorized Poisson case, estimation based on the CPPL may be a computationally more feasible alternative than estimation based on Huang and Ogata's GPL.

    Affiliation:
    Statistics Department, Chapingo Autonomous University
  • Toggle Item
    2019-11-14 - Jennifer Sinnott - Genetic Association Testing with Imperfect Phenotypes Derived From Electronic Health Records
    Abstract:
    Electronic health records linked to blood samples form a powerful new data resource that can provide much larger, more diverse samples for testing associations between genetic markers and disease. However, algorithms for estimating certain phenotypes, especially those that are complex and/or difficult to diagnose, produce outcomes subject to measurement error. Much work is needed to determine best practices for implementing and analyzing such data. To this end, we recently proposed a method for analyzing case-control studies when disease status is estimated by a phenotyping algorithm; our method improves power and eliminates bias when compared to the standard approach of dichotomizing the algorithm prediction and analyzing the data as though case-control status were known perfectly. The method relies on knowing certain qualities of the algorithm, such as its sensitivity, specificity, and positive predictive value, but in practice these may not be known if no ``gold-standard'' phenotypes are known in the population. A common setting where that occurs is in phenome-wide association studies (PheWASs), in which a wide range of phenotypes are of interest, and all that is available for each phenotype is a surrogate measure, such as the number of billing codes for that disease. We proposed a method to perform genetic association tests in this setting, which improves power over existing methods that typically identify cases based on thresholding the number of billing codes. In this talk, I will describe these methods, and present applications to studies of rheumatoid arthritis in the Partners Healthcare System.

    Affiliation:
    Ohio State University/University of Utah

    Date:
    019-11-14
  • Toggle Item
    2019-11-05 - Jacob Mortensen - Statistical Methods for Modeling Movement
    Abstract:
    In recent years, tracking data in has become widespread, allowing researchers to model movement at a very high level of detail. In this talk I will present two examples of statistical research inspired by this type of data. In the first, I present a method for nonparametric estimation of continuous-state Markov transition densities. Our approach uses a Poisson point process to represent the joint transition space, then divides that process by the marginal intensity to estimate the conditional transition density. Modeling a transition density as a point process creates a general framework that admits a wide variety of implementations, depending on suitability for a given application and at the discretion of the modeler. A key feature of this point process representation is that it allows the presence of spatial structure to inform transition density estimation. We illustrate this by using our method to model ball movement in the National Basketball Association, enabling us to capture the effects of spatial features, such as the three point line, that impact transition density values. In the second, I will show how broadcast-derived tracking data can be used to estimate external load metrics in sports science. Sports scientists use high resolution coordinate data to estimate external load metrics, such as acceleration load and high speed running distance, traditionally used to understand the physical toll a game takes on an athlete. Unfortunately, collecting this data requires installation of expensive hardware and paying costly licensing fees to data providers, restricting its availability. Algorithms have been developed that allow a traditional broadcast feed to be converted to x-y coordinate data, making tracking data easier to acquire, but coordinates are available for an athlete only when that player is within the camera frame. Obviously, this leads to inaccuracies in external load estimates, limiting the usefulness of this data for sports scientists. In this research, we develop models that predict offscreen load metrics and demonstrate the viability of broadcast-derived tracking data for understanding external load in soccer.

    Affiliation:
    Simon Fraser University
  • Toggle Item
    2019-10-31- Nathan Sandholtz - Modeling human decision-making in spatio-temporal systems: An observational and an experimental case study
    Abstract:
    In this talk I present two contrasting analyses of human decision-making behavior inspatio-temporal systems. In the first case, we examine player shooting decisions inprofessional basketball. We assume that all players operate under the same objectivefunction on offense---namely, maximizing their team's total expected points. Our goal is toidentify areas where, conditional on location, lineups exhibit potential inefficiencies inallocating shots among their players. We do this by comparing a player's probability ofmaking a shot to the rate at which he shoots in context of both his four teammates on thecourt and the spatial distribution of his shots. While on average players are highly efficientwith respect to the shot allocation metrics we introduce, nearly every lineup exhibits somedegree of potential inefficiency. We estimate and visualize the points that are potentiallylost and identify which players are responsible.

    In the second case, we analyze an experiment in which subjects were tasked withmaximizing a reward in a simple "hotspot" computer game. As in the basketball example,subjects made decisions to maximize a point total. However, unlike shots in a basketballgame, this task was specifically designed to induce uncertainty about the effect an actionhas on the subsequent point outcome. This forced subjects to balance exploration andexploitation in their strategy. Our analysis shows that subjects exhibit vastly differentpreferences regarding the exploration vs. exploitation tradeoff. For this reason, we cannotassume a global strategy which all subjects follow. On the contrary, inferring each subject'slatent strategy (or acquisition function, as referred to in the Bayesian optimizationliterature) actually becomes the primary goal of our research. We find that the classicalsuite of acquisition functions doesn’t adequately explain every subject's behavior, and wepropose a modification to this suite of acquisition functions which better explains thesubjects' collective set of decisions.

    Affiliation:
    Simon Fraser University

    Date:
    2019-10-31
  • Toggle Item
    2019-10-24-Alex Petersen-Partial Separability and Graphical Models for Multivariate Functional Data
    Abstract:
    Graphical models are a ubiquitous tool for identifying dependencies among components of high-dimensional multivariate data. Recently, these tools have been extended to estimate dependencies between components of multivariate functional data by applying multivariate methods to the coefficients of truncated basis expansions. A key difficulty compared to multivariate data is that the covariance operator is compact, and thus not invertible. In this talk, we will discuss a property called partial separability that circumvents the invertibility issue and identifies the functional graphical model with a countable collection of finite-dimensional graphical models. This representation allows for the development of simple and intuitive estimators. Finally, we will demonstrate the empirical findings of our method through simulation and analysis of functional brain connectivity during a motor task.

    Affiliation:
    University of California, Santa Barbara

    Date:
    2019-10-24
  • Toggle Item
    2019-10-17 Jonathan Blake - A Risk Manager's Guide to a Faith Journey
    Abstract:
    Jonathan Blake, recipient of the college Alumni Achievement Award, will give a guest lecture on Thursday, October 17. Entitled “A Risk Manager’s Guide to a Faith Journey,” the lecture will take place at 11 a.m. in room 1170 of the Talmage Building. The public is invited to attend the event. For over twenty years, Blake has been employed in a variety of actuarial roles. He is currently the Vice President and Lead Actuary at The Hanover Insurance Group. In this position, he assesses the financial strength of over three billion dollars in domestic reserve position. Blake is also a partner with the Personal, Commercial, and Specialty businesses units, where he helps units engage in profitable growth strategic initiatives. Blake graduated magna cum laude with a B.S. and M.S. from the Department of Statistics in the College of Physical and Mathematical Sciences. He has previously worked in Ohio, Texas, and Massachusetts and served as present, vice president, and board member of the Casualty Actuaries of New England. He is currently a Fellow of the Casualty Actuarial Society and a member of the American Academy of Actuaries. Blake and his wife, Julia, have six children. Each year, every college on Brigham Young University campus honors one alumnus or alumna with this prestigious award. Blake received the 2019 Alumni Achievement Award from the College of Physical and Mathematical Sciences.

    Affiliation:
    Hanover
  • Toggle Item
    2019-10-10 - Wes Johnson - Gold Standards are Out and Bayes is In: Implementing the Cure for Imperfect Reference Tests in Diagnostic Accuracy Studies
    Abstract:
    Bayesian mixture models, often termed latent class models, allow users to estimate the diagnostic accuracy of tests and true prevalence in one or more populations when the positive and/or negative reference standards are imperfect. Moreover, they allow the data analyst to show the superiority of a novel test over an old test, even if this old test is the (imperfect) reference standard. We use published data on Toxoplasmosis in pigs to explore the effects of numbers of tests, numbers of populations, and dependence structure among tests to ensure model (local) identifiability. We discuss and make recommendations about the use of priors, sensitivity analysis, model identifiability and study design options, and strongly argue for the use of Bayesian mixture models as a logical and coherent approach for estimating the diagnostic accuracy of two or more tests.

    Affiliation:
    Department of Statistics, University of California Irvine

    Date:
    October 10th, 2019

    Website:
    https://www.ics.uci.edu/~wjohnson/
  • Toggle Item
    2019-10-03 - John Lawson - Useful Models and Design Strategies for Experimentation - A Career Long Perspective
    Affiliation:
    Department of Statistics, Brigham Young University

    Date:
    October 3, 2019

    Website:
    John Lawson
  • Toggle Item
    2019-09-26 - Matt Heiner - Bayesian Nonparametric Density Autoregression with Lag Selection
    Abstract:
    We propose and illustrate a Bayesian nonparametric autoregressive model applied to flexibly estimate general transition densities exhibiting nonlinear lag dependence. Our approach is related to Bayesian curve fitting via joint density estimation using Dirichlet process mixtures, with the Markovian likelihood defined as the conditional distribution obtained from the mixture. This results in a nonparametric extension of a mixture-of-experts formulation. We address computational challenges to posterior sampling that arise from the conditional likelihood. We illustrate the base model by fitting to synthetic data simulated from a classical model for population dynamics, as well as a time series of successive waiting times between eruptions of Old Faithful Geyser. We explore inferences available through the base model before extending the model to include automatic relevance detection among a pre-specified set of lags. We explore methods and inferences for global and local lag selection with additional simulation studies, and illustrate by fitting to an annual time series of pink salmon abundance in a stream in Alaska. We further explore and compare transition density estimation performance for alternative configurations of the proposed model.

    Affiliation:
    Department of Statistics, Brigham Young University

    Date:
    September 26, 2019
  • Toggle Item
    2019-09-19 - Adam Smith - Bayesian Analysis of Partitioned and Large-Scale Demand Models
    Abstract:
    The analysis of consumer purchase behavior is a core component of marketing and economic research, but becomes challenging with large product assortments. I discuss two approaches for estimating demand models with a high-dimensional set of products. The first approach is based on partitioning demand: these models assume that products can be categorized into groups and then define consumer substitution patterns at the group-level rather than product-level. While this can significantly reduce the dimension of the parameter space, it can also lead to inaccurate inferences if the product categories do not match the structure of consumer preferences. To overcome this problem, I let the partition be a model parameter and propose a Bayesian method for inference. The second approach is based on regularization: I propose a new class of shrinkage priors for price elasticities in high-dimensional demand models. The prior has a hierarchical structure where the direction and rate of shrinkage depend on the information in a product classification tree. Both approaches are illustrated with store-level scanner data and the effects on demand predictions and product competition are discussed

    Affiliation:
    UCL School of Management, University College London

    Date:
    September 19, 2019

    Website:
    Adam Smith
  • Toggle Item
    2019-04-04 - Daniel Apley - Understanding the Effects of Predictor Variables in Black-Box Supervised Learning Models

    Presenter:

    Daniel Apley

    Title:

    Understanding the Effects of Predictor Variables in Black-Box Supervised Learning Models

    Affiliation:

    Northwestern University

    Date:

    April 4, 2019

    Abstract:

    For many supervised learning applications, understanding and visualizing the effects of the predictor variables on the predicted response is of paramount importance. A shortcoming of black-box supervised learning models (e.g., complex trees, neural networks, boosted trees, random forests, nearest neighbors, local kernel-weighted methods, support vector regression, etc.) in this regard is their lack of interpretability or transparency. Partial dependence (PD) plots, which are the most popular general approach for visualizing the effects of the predictors with black box supervised learning models, can produce erroneous results if the predictors are strongly correlated, because they require extrapolation of the response at predictor values that are far outside the multivariate envelope of the training data. Functional ANOVA for correlated inputs can avoid this extrapolation but involves prohibitive computational expense and subjective choice of additive surrogate model to fit to the supervised learning model. We present a new visualization approach that we term accumulated local effects (ALE) plots, which have a number of advantages over existing methods. First, ALE plots do not require unreliable extrapolation with correlated predictors. Second, they are orders of magnitude less computationally expensive than PD plots, and many orders of magnitude less expensive than functional ANOVA. Third, they yield convenient variable importance/sensitivity measures that possess a number of desirable properties for quantifying the impact of each predictor.

    Website:

    Dr. Apley's Website
  • Toggle Item
    2019-03-28 - Jeff Miller - Flexible perturbation models for robustness to misspecification

    Presenter:

    Dr. Jeff Miller

    Title:

    Flexible perturbation models for robustness to misspecification

    Affiliation:

    Harvard

    Date:

    March 28, 2019

    Abstract:

    In many applications, there are natural statistical models with interpretable parameters that provide insight into questions of interest. While useful, these models are almost always wrong in the sense that they only approximate the true data generating process. In some cases, it is important to account for this model error when quantifying uncertainty in the parameters. We propose to model the distribution of the observed data as a perturbation of an idealized model of interest by using a nonparametric mixture model in which the base distribution is the idealized model. This provides robustness to small departures from the idealized model and, further, enables uncertainty quantification regarding the model error itself. Inference can easily be performed using existing methods for the idealized model in combination with standard methods for mixture models. Remarkably, inference can be even more computationally efficient than in the idealized model alone, because similar points are grouped into clusters that are treated as individual points from the idealized model. We demonstrate with simulations and an application to flow cytometry.

    Website:

    Dr. Miller's Website
  • Toggle Item
    2019-03-21 - Yue Zhang - Multi-state Approach for Studying Cancer Care Continuum using EHR data

    Presenter:

    Dr. Yue Zhang

    Title:

    Multi-state Approach for Studying Cancer Care Continuum using EHR data

    Affiliation:

    University of Utah

    Date:

    March 21, 2019

    Abstract:

    Diagnostic evaluation of suspected breast cancer due to abnormal screening mammography results is common, creates anxiety for women and is costly for the healthcare system. Timely evaluation with minimal use of additional diagnostic testing is key to minimizing anxiety and cost. In this paper, we propose a Bayesian semi-Markov model that allows for flexible, semi-parametric specification of the sojourn time distributions and apply our model to an investigation of the process of diagnostic evaluation with mammography, ultrasound and biopsy following an abnormal screening mammogram. We also investigate risk factors associated with the sojourn time between diagnostic tests. By utilizing semi-Markov processes, we expand on prior work that described the timing of the first test received by providing additional information such as the mean time to resolution and proportion of women with unresolved mammograms after 90 days for women requiring different sequences of tests in order to reach a definitive diagnosis. Overall, we found that older women were more likely to have unresolved positive mammograms after 90 days. Differences in the timing of imaging evaluation and biopsy were generally on the order of days and thus did not represent clinically important differences in diagnostic delay.

    Website:

    Dr. Zhang's Webpage
  • Toggle Item
    2019-03-14 - Dennis Tolley - DATA: Whence it Came…Where it’s Going

    Presenter:

    Dr. Dennis Tolley

    Title:

    DATA: Whence it Came…Where it’s Going

    Affiliation:

    BYU

    Date:

    March 14, 2019

    Abstract:

    A defining activity of statisticians is the handling, processing, analyzing and interpreting of data. With “big data” upon us, it is sometimes easy to forget some basic principles in the use of data. In this seminar I review some basic guidelines regarding data that apply before one actually begins to physically process the data files. I also review some guidelines based on the ultimate use of the results that assist in how a statistician will formulate a methodology and carry out the analysis. Application of these guidelines is illustrated with a simple problem in liquid chromatography that gives rise to a family of random walk models. These models, in turn, lay the foundation for a family of research problems in statistics.

    Website:

    Dr. Tolley's Website
  • Toggle Item
    2019-03-07 - Grant Schultz - Utah Crash Prediction Models: A Joint Effort for Success

    Presenter:

    Dr. Grant Schultz

    Title:

    Utah Crash Prediction

    Models: A Joint Effort for Success

    Affiliation:

    BYU

    Date:

    March 7, 2019

    Abstract:

    The Utah Department of Transportation (UDOT) continues to advance the safety of the state roadway network through their participation and endorsement of the “Zero Fatalities: A Goal We Can All Live With™” campaign to increase awareness of the importance of highway safety. As a continuing effort by UDOT to advance the safety of their roadway network, research has been conducted wherein statistical models have been developed that allow users to evaluate the safety of roadways within the state. Three models have developed by a team of Civil and Environmental Engineering and Statistics faculty and students. These models include the Utah Crash Prediction Model (UCPM), the Utah Crash Severity Model (UCSM), and the Utah Intersection Crash Prediction Model (UICPM). Using the output from these models, UDOT Safety Programs engineers, Region directors, and other interested users have access to data that will allow them to make informed decisions related to prioritizing highway safety projects and programs within the state of Utah.

    Website:

    Dr. Schultz Webpage
  • Toggle Item
    2019-02-28 - Ephraim Hanks - Random walk spatial models for spatially correlated genetic data

    Presenter:

    Dr. Ephraim Hanks

    Title:

    Random walk spatial models for spatially correlated genetic data

    Affiliation:

    Penn State

    Date:

    February 28, 2019

    Abstract:

    Landscape genetics is the study of how landscape features, like rivers, mountains, and roads, influence genetic connectivity of wildlife populations. We build models for spatial genetic correlation based off of spatio-temporal models for how animals move across the landscape. This approach provides insights into common spatial models, such as simultaneous autoregressive (SAR) models and common Matern covariance models. It also allows for scientific interpretation of spatial covariance parameters. We illustrate this approach in a study of brook trout, where we provide the first parametric description of how stream characteristics influence genetic connectivity.

    Website:

    Dr. Hanks' Website
  • Toggle Item
    2019-02-21 - Michele Guindani - Bayesian Approaches to Dynamic Model Selection

    Presenter:

    Michele Guindani

    Title:

    Bayesian Approaches to Dynamic Model Selection

    Affiliation:

    University of California, Irvine

    Date:

    February 21, 2019

    Abstract:

    In many applications, investigators monitor processes that vary in space and time, with the goal of identifying temporally persistent and spatially localized departures from a baseline or ``normal" behavior. In this talk, I will first discuss a principled Bayesian approach for estimating time varying functional connectivity networks from brain fMRI data. Dynamic functional connectivity, i.e., the study of how interactions among brain regions change dynamically over the course of an fMRI experiment, has recently received wide interest in the neuroimaging literature. Our method utilizes a hidden Markov model for classification of latent neurological states, achieving estimation of the connectivity networks in an integrated framework that borrows strength over the entire time course of the experiment. Furthermore, we assume that the graph structures, which define the connectivity states at each time point, are related within a super-graph, to encourage the selection of the same edges among related graphs. Then, I will propose a Bayesian nonparametric model selection approach with an application to the monitoring of pneumonia and influenza (P&I) mortality, to detect influenza outbreaks in the continental United States. More specifically, we introduce a zero-inflated conditionally identically distributed species sampling prior which allows borrowing information across time and to assign data to clusters associated to either a null or an alternate process. Spatial dependences are accounted for by means of a Markov random field prior, which allows to inform the selection based on inferences conducted at nearby locations. We show how the proposed modeling framework performs in an application to the P&I mortality data and in a simulation study, and compare with common threshold methods for detecting outbreaks over time, with more recent Markov switching based models, and with other Bayesian nonparametric priors that do not take into account spatio-temporal dependence.

    Website:

    Dr. Guidani's Website
  • Toggle Item
    2019-02-14 - Garritt Page - Temporal and Spatio-Temporal Random Partition Models

    Presenter:

    Dr. Garritt Page

    Title:

    Temporal and Spatio-Temporal Random Partition Models

    Affiliation:

    BYU

    Date:

    February 14, 2019

    Abstract:

    The number of scientific fields that regularly collect data that are temporally and spatially referenced continues to experience rapid growth. An intuitive feature in data that are spatio-temporal is that measurements taken on experimental units near each other in time and space tend to be similar. Because of this, many methods developed to accommodate spatio-temporal dependent structures perform a type of implicit grouping based on time and space. Rather than implicitly grouping observations through a type of smoothing, we develop a class of dependent random partition models that explicitly models spatio-temporal clustering. This model can be thought of as a joint distribution for a sequence of random partitions indexed by time and space. We first detail how temporal dependence is incorporated so that partitions evolve gently over time. Then a few properties of the joint model are derived and induced dependence at the observation level is explored. Afterwards, we demonstrate how space can be integrated. Computation strategies are detailed and we apply the method to Chilean standardized testing scores.

    Website:

    Dr. Page's Website

  • Toggle Item
    2019-02-07 - Gil Fellingham - Predicting Performance of Professional Golfers

    Presenter:

    Dr. Gil Fellingham

    Title:

    Predicting Performance of Professional Golfers

    Affiliation:

    BYU

    Date:

    February 7, 2019

    Abstract:

    Many statisticians agree that building models that predict well should be a high priority. (Harville, 2014, Stern, 2014, Berry and Berry, 2014). The purpose of this paper is to test the predictive ability of various Bayesian models using a group of closely matched members of the Professional Golf Association (PGA). Predicting performance of PGA golfers is a notoriously difficult task. We fit six different models to scores produced by 22 PGA golfers playing on 18 different golf courses in 2014. We then use these models to predict scores for the same golfers and golf courses as well as other golfers and other courses in 2015. We varied model complexity across two different dimensions. In one dimension we fit model intercepts using parametric Bayesian, nonparametric Bayesian, and hierarchical Bayesian methods. In the other dimension, we either included covariates for driving distance, greens hit in regulation, and difficulty of course as measured by slope, or we did not include the covariates. Preliminary results indicate that nonparametric Bayesian methods seem marginally better.

    Website:

    Dr. Fellingham's Webpage
  • Toggle Item
    2019-01-31 - Matthias Katzfuss - Gaussian-Process Approximations for Big Data

    Presenter:

    Matthias Katzfuss

    Title:

    Gaussian-Process Approximations for Big Data

    Affiliation:

    Texas A&M University

    Date:


    Abstract:

    Gaussian processes (GPs) are popular, flexible, and interpretable probabilistic models for functions. GPs are well suited for big data in areas such as machine learning, regression, and geospatial analysis. However, direct application of GPs is computationally infeasible for large datasets. We consider a framework for fast GP inference based on the so-called Vecchia approximation. Our framework contains many popular existing GP approximations as special cases. Representing the models by directed acyclic graphs, we determine the sparsity of the matrices necessary for inference, which leads to new insights regarding the computational properties. Based on these results, we propose novel Vecchia approaches for noisy, non-Gaussian, and massive data. We provide theoretical results, conduct numerical comparisons, and apply the methods to satellite data.

    Website:

    Dr. Katzfuss Website
  • Toggle Item
    2019-01-24 - Brennan Bean - Interval-Valued Kriging with Applications in Design Ground Snow Load Prediction

    Presenter:

    Brennan Bean

    Title:

    Interval-Valued Kriging with Applications in Design Ground Snow Load Prediction

    Affiliation:

    Utah State University

    Date:

    January 24, 2019

    Abstract:

    The load induced by snow on the roof of a structure is a serious design consideration in many western and northeastern states: under-estimating loads can lead to structure failure while over-estimating loads unnecessarily increases construction costs. Recent updates to the design ground snow load requirements in Utah use geostatistics models to produce design ground snow load estimates that have shown significantly improved accuracy. However, the model inputs are subject to several sources of uncertainty including measurement limitations, short observation periods, and shortcomings in the distribution fitting process, among others. Ignoring these uncertainties in the modeling process could result in critical information loss that robs the final predictions of proper context. One way to account for these uncertainties is to express the data by intervals, as opposed to single numbers. Interval-valued geostatistics models for uncertainty characterization were originally considered and studied in the late 1980s. However, those models suffer from several fundamental problems that limit their application. This presentation proposes to modify and improve the interval-valued kriging models proposed by Diamond (1989) based on recent developments of random set theory. The resulting new models are shown to have more structured formulation and computational feasibility. A numerical implementation of these models is developed based on a modified Newton-Raphson algorithm and its finite sample performance is demonstrated through a simulation study. These models are applied to the Utah snow load dataset and produce an interval-valued version of the 2018 Utah Snow Load Study. The interesting and promising implications of these new results to design ground snow load and structural risk analysis will be thoroughly discussed.

    Website:

    Brennan's Webpage
  • Toggle Item
    2019-01-17 - Ron Reeder - Improving outcomes after pediatric cardiac arrest – a hybrid stepped-wedge trial

    Presenter:

    Ron Reeder

    Title:

    Improving outcomes after pediatric cardiac arrest – a hybrid stepped-wedge trial

    Affiliation:

    University of Utah

    Date:

    January 17, 2019

    Abstract:

    Quality of cardiopulmonary resuscitation (CPR) is associated with survival, but recommended guidelines are often not met, and less than half the children with an in-hospital arrest will survive to discharge. A single-center before-and-after study demonstrated that outcomes may be improved with a novel training program in which all pediatric intensive care unit staff are encouraged to participate in frequent CPR refresher training and regular, structured resuscitation debriefings focused on patient-centric physiology.

    I’ll present the design of an ongoing trial that will assess whether a program of structured debriefings and point-of-care bedside practice that emphasizes physiologic resuscitation targets improves the rate of survival to hospital discharge with favorable neurologic outcome in children receiving CPR in the intensive care unit. This study is designed as a hybrid stepped-wedge trial in which two of ten participating hospitals are randomly assigned to enroll in the intervention group and two are assigned to enroll in the control group for the duration of the trial. The remaining six hospitals enroll initially in the control group but will transition to enrolling in the intervention group at randomly assigned staggered times during the enrollment period.

    This trial is the first implementation of a hybrid stepped-wedge design. It was chosen over a traditional stepped-wedge design because the resulting improvement in statistical power reduces the required enrollment by 9 months (14%). However, this design comes with additional challenges, including logistics of implementing an intervention prior to the start of enrollment. Nevertheless, if results from the single-center pilot are confirmed in this trial, it will have a profound effect on CPR training and quality improvement initiatives.

    Website:

    Dr. Reeder's Website
  • Toggle Item
    2019-01-10 - Juan Rodriguez - Deep Learning to Save Humanity

    Presenter:

    Juan Rodriguez

    Title:

    Deep Learning to Save Humanity

    Affiliation:

    Recursion Pharmaceuticals

    Date:

    January 10, 2019

    Abstract:

    During the last 50 years, the advances in computational processing and storage have overshadowed the progress of most areas of research. At Recursion Pharmaceuticals we are translating these advances into biological results to change the way drug discovery is done. We are hyper-parallelizing the scientific method to discover new treatments for patients. This new approach presents unique statistical and mathematical challenges in the area of artificial intelligence and computer vision which will be presented.

    Website:

    Company Website
  • Toggle Item
    2018-12-06 - Dennis Eggett - Making the best of messy data: A return to basics

    Presenter:

    Dr. Dennis Eggett

    Title:

    Making the best of messy data: A return to basics.

    Affiliation:

    BYU

    Date:

    December 6, 2018

    Abstract:

    When your data does not meet the basic assumptions of an analysis method, you have to go back to the basics in order to glean the information you need. Three data sets will be used to explore resampling methods based on the definition of a p-value and the central limit theorem. A simple two sample t-test of a data set that is not near normal and does not conform to non-parametric methods is used to demonstrate resampling in its simplest form. A mixed model analysis of highly skewed data will be used to demonstrate how to maintain its structure through the resampling process. And a resampling of a very large data set to demonstrate the finding parameter estimates and confidence intervals.

    Website:

    Dr. Eggett's Webpage
  • Toggle Item
    2018-11-29 - Bruno Sanso - Multi-Scale Models for Large Non-Stationary Spatial Datasets

    Presenter:

    Bruno Sanso

    Title:

    Multi-Scale Models for Large Non-Stationary Spatial Datasets

    Affiliation:

    University of California Santa Cruz

    Date:

    November 29, 2018

    Abstract:

    Large spatial datasets often exhibit features that vary at different scales as well as at different locations. To model random fields whose variability changes at differing scales we use multiscale kernel convolution models. These models rely on nested grids of knots at different resolutions. Thus, lower order terms capture large scale features, while high order terms capture small scale ones. In this talk we consider two approaches to fitting multi-resolution models with space-varying characteristics. In the first approach, to accommodate the space-varying nature of the variability, we consider priors for the coefficients of the kernel expansion that are structured to provide increasing shrinkage as the resolution grows. Moreover, a tree shrinkage prior auto-tunes the degree of resolution necessary to model a subregion in the domain. In addition, compactly supported kernel functions allow local updating of the model parameters which achieves massive scalability by suitable parallelization. As an alternative, we develop an approach that relies on knot selection, rather than shrinkage, to achieve parsimony, and discuss how this induces a field with spatially varying resolution. We extend shotgun stochastic search to the multi resolution model setting, and demonstrate that this method is computationally competitive and produces excellent fit to both synthetic and real dataset.

    Website:

    Dr. Sanso's Website
  • Toggle Item
    2018-11-15 - Margie Rosenberg - Unsupervised Clustering Techniques using all Categorical Variables

    Presenter:

    Margie Rosenberg

    Title:

    Unsupervised Clustering Techniques using all Categorical Variables

    Affiliation:

    University of Wisconsin-Madison

    Date:

    November 15, 2018

    Abstract:

    We present a case study to illustrate a novel way of clustering individuals to create groups of similar individuals where covariates are all categorical. Our method is especially useful when applied to multi-level categorical data where there is no inherent order in the variable, like race. We use data from the National Health Interview Survey (NHIS) to form the clusters and apply these clusters for prediction purposes to the Medical Expenditures Panel Study (MEPS). Our approach considers the person-weighting of the surveys to produce clusters and estimates of expenditures per cluster that are representative of the US adult civilian non-institutionalized population. For our clustering method, we apply the K-Medoids approach with an adapted version of the Goodall dissimilarity index. We validate our approach on independent NHIS/MEPS data from a different panel. Our results indicate the robustness of the clusters across years and indicate the ability to distinguish clusters for the predictability of expenditures.

    Website:

    Dr. Rosenberg's Website

  • Toggle Item
    2018-11-08 - Terrance Savitsky - Bayesian Uncertainty Estimation under Complex Sampling

    Presenter:

    Terrance Savitsky

    Title:

    Bayesian Uncertainty Estimation under Complex Sampling

    Affiliation:

    Bureau of Labor Statistics

    Date:

    November 8, 2018

    Abstract:

    Multistage, unequal probability sampling designs utilized by federal statistical agencies are typically constructed to maximize the efficiency of the target domain level estimator (e.g., indexed by geographic area) within cost constraints for survey administration. Such designs may induce dependence between the sampled units; for example, with employment of a sampling step that selects geographically-indexed clusters of units. A sampling-weighted pseudo-posterior distribution may be used to estimate the population model on the observed sample. The dependence induced between co-clustered units inflates the scale of the resulting pseudo-posterior covariance matrix that has been shown to induce under coverage of the credibility sets. We demonstrate that the scale and shape of the asymptotic distributions are different between each of the pseudo-MLE, the pseudo-posterior and the MLE under simple random sampling. We devise a correction applied as a simple and fast post-processing step to MCMC draws of the pseudo-posterior distribution that projects the pseudo-posterior covariance matrix such that the nominal coverage is approximately achieved. We demonstrate the efficacy of our scale and shape projection procedure on synthetic data and make an application to the National Survey on Drug Use and Health.

    Website:

  • Toggle Item
    2018-11-01 - Dustin Harding - How Renting Products Increases Consumer Confidence and Commitment

    Presenter:

    Dustin Harding

    Title:

    How Renting Products Increases Consumer Confidence and Commitment

    Affiliation:

    UVU

    Date:

    October 25, 2018

    Abstract:

    Consumers can obtain skill-based products through a variety of acquisition modes, such as purchase or rental. Despite the rise of nonpurchase acquisition modes, surprisingly little research has explored the effects of differential acquisition modes on consumer behavior. This research begins to fill this gap in the literature by examining the effect of acquisition mode on the expected time necessary to master newly adopted skill-based products and the downstream consequences for consumers and marketers. Results of four experiments and a field study show that purchasing, versus renting, products requiring skill-based learning increases the amount of time consumers expect to be required to master them. Further, the differences in speed of product mastery, in turn, impact subsequent consumer behavior via differential levels of product use commitment.

    Website:

    Dr. Harding's Website
  • Toggle Item
    2018-10-25 - Alex Petersen - Wasserstein Regression and Covariance for Random Densities

    Presenter:

    Alex Petersen

    Title:

    Wasserstein Regression and Covariance for Random Densities

    Affiliation:

    UC Santa Barbara

    Date:

    October 25, 2018

    Abstract:

    Samples of density functions appear in a variety of disciplines, including distributions of mortality across nations, CT density histograms of hematoma in post-stroke patients, and distributions of voxel-to-voxel correlations of fMRI signals across subjects. The nonlinear nature of density space necessitates adaptations and new methodologies for the analysis of random densities. We define our geometry using the Wasserstein metric, an increasingly popular choice in theory and application. First, when densities appear as responses in a regression model, the utility of Fréchet regression, a general purpose methodology for response objects in a metric space, is demonstrated. Due to the manifold structure of the space, inferential methods are developed allowing for tests of global and partial effects, as well as simultaneous confidence bands for fitted densities. Second, a notion of Wasserstein covariance is proposed for multivariate density data (a vector of densities), where multiple densities are observed for each subject. This interpretable dependence measure is shown to reveal interesting differences in functional connectivity between a group of Alzheimer's subjects and a control group.

    Website:

    Dr. Petersen's Website
  • Toggle Item
    2018-10-18 - Abel Rodriguez - Spherical Factor Analysis for Binary Data: A Look at the Conservative Revolt in the US House of Representatives

    Presenter:

    Abel Rodriguez

    Title:

    Spherical Factor Analysis for Binary Data: A Look at the Conservative Revolt in the US House of Representatives

    Affiliation:

    UC Santa Cruz

    Date:

    October 18, 2018

    Abstract:

    Factors models for binary data are extremely common in many social science disciplines. For example, in political science binary factor models are often used to explain voting patterns in deliberative bodies such as the US Congress, leading to an “ideological” ranking of legislators. Binary factor models can be motivated through so-call “spatial” voting models, which posit that legislators have a most preferred policy – their ideal point –, which can be represented as a point in some Euclidean “policy space”. Legislators then vote for/against motions in accordance with the distance between their (latent) preferences and the position of the bill in the same policy space. In this talk we introduce a novel class of binary factor models derived from spatial voting models in which the policy space corresponds to a non-Euclidean manifold. In particular, we consider embedding legislator’s preferences in the surface of a n-dimensional sphere. The resulting model contains the standard binary Euclidean factor model as a limiting case, and provides a mechanism to operationalize (and extend) the so-called “horseshoe theory” in political science, which postulates that the far-left and far-right are more similar to each other in essentials than either are to the political center. The performance of the model is illustrated using voting data from recent US Congresses. In particular, we show that voting patterns for the 113th US House of Representatives are better explained by a circular factor model than by either a one- or a two-dimensional Euclidean model, and that the circular model yields a ranking of legislators more in accord with expert’s expectations.

    Website:

    Dr. Rodriguez's Website
  • Toggle Item
    2018-09-20 - Scott Grimshaw - Going Viral, Binge Watching, and Attention Cannibalism

    Presenter:

    Dr. Scott Grimshaw

    Title:

    Going Viral, Binge Watching, and Attention Cannibalism

    Affiliation:

    BYU

    Date:

    September 20, 2018

    Abstract:

    Since digital entertainment is often described as viral this paper uses the vocabulary and statistical methods for diseases to analyze viewer data from an experiment at BYUtv where a program's premiere was exclusively digital. Onset time, the days from the program premiere to a viewer watching the first episode, is modeled using a changepoint between epidemic viewing with a non-constant hazard rate and endemic viewing with a constant hazard rate. Finish time, the days from onset to a viewer watching all episodes, uses an expanded negative binomial hurdle model to reflect the characteristics of binge watching. The hurdle component models binge racing where a viewer watches all episodes on the same day as onset. One reason binge watching appeals to viewers is that they can focus attention on a single program's story line and characters before moving on to a second program. This translates to a competing risks model that has an impact on scheduling digital premieres. Attention cannibalism occurs when a viewer takes a long time watching their first choice program and then never watches a second program or delays watching the second program until much later. Scheduling a difference in premieres reduces attention cannibalism.

    Website:

    Dr. Grimshaw's website
  • Toggle Item
    2018-04-12 - Cristian Tomasetti - Cancer etiology, evolution and early detection

    Presenter:

    Dr. Cristian Tomasetti

    Title:

    Cancer etiology, evolution, and early detection

    Affiliation:

    Johns Hopkins University School of Medicine

    Date:

    Apr 12, 2018

    Abstract:

    The standard paradigm in cancer etiology is that inherited factors and lifestyle, environmental exposures are the causes of cancer. I will present recent findings indicating that a third cause, never considered before, plays a large role: "bad luck", i.e. the pure chance involved in DNA replication when cells divide. Novel mathematical and statistical methodologies for distinguishing among these causes will also be introduced. I will then conclude with a new approach for the early detection of cancer.

    Website:

    Dr. Tomasetti's Website
  • Toggle Item
    2018-03-29 - H. Dennis Tolley - What's the Likelihood?

    Presenter:

    H. Dennis Tolley

    Title:

    What's the Likelihood?

    Affiliation:

    BYU

    Date:

    Mar 29, 2018

    Abstract:

    The likelihood function plays a major role in both frequentist and Bayesian methods of data analysis. Non-parametric Bayesian models also rely heavily on the form of the likelihood. Despite its heuristic foundation, the likelihood has several desirable large sample statistical properties that prompt its use among frequentists. Additionally, there are other important facets of the likelihood that warrant its formulation in many circumstances. As fundamental as the likelihood is, however, beginning students are only given a cursory introduction into how to formulate the likelihood. This seminar illustrates the formulation of the likelihood for a family of statistical problems common in the physical sciences. By examining the basic scientific principles associated with an experimental set-up, we show the step by step construction of the likelihood, starting with the discrete random walk model as a paradigm. The resulting likelihood is the solution to a stochastic differential equation. Elementary applications of the likelihood are illustrated.

    Website:

    Dr. Tolley's website
  • Toggle Item
    2018-03-22 - Matthew Heaton - Methods for Analyzing Large Spatial Data: A Review and Comparison

    Presenter:

    Dr. Matthew Heaton

    Title:

    Methods for Analyzing Large Spatial Data: A Review and Comparison

    Affiliation:

    BYU

    Date:

    Mar 22, 2018

    Abstract:

    The Gaussian process is an indispensable tool for spatial data analysts. The onset of the “big data” era, however, has lead to the traditional Gaussian process being computationally infeasible for modern spatial data. As such, various alternatives to the full Gaussian process that are more amenable to handling big spatial data have been proposed. These modern methods often exploit low rank structures and/or multi-core and multi-threaded computing environments to facilitate computation. This study provides, first, an introductory overview of several methods for analyzing large spatial data. Second, this study describes the results of a predictive competition among the described methods as implemented by different groups with strong expertise in the methodology. Specifically, each research group was provided with two training datasets (one simulated and one observed) along with a set of prediction locations. Each group then wrote their own implementation of their method to produce predictions at the given location and each which was subsequently run on a common computing environment. The methods were then compared in terms of various predictive diagnostics.

    Website:

    Dr. Heaton's website
  • Toggle Item
    2018-03-15 - Timothy Hanson - A unified framework for fitting Bayesian semiparametric models to arbitrarily censored spatial survival data

    Presenter:

    Timothy Hanson

    Title:

    A unified framework for fitting Bayesian semiparametric models to arbitrarily censored spatial survival data

    Affiliation:

    Medtronic

    Date:

    Mar 15, 2018

    Abstract:

    A comprehensive, unified approach to modeling arbitrarily censored spatial survival data is presented for the three most commonly-used semiparametric models: proportional hazards, proportional odds, and accelerated failure time. Unlike many other approaches, all manner of censored survival times are simultaneously accommodated including uncensored, interval censored, current-status, left and right censored, and mixtures of these. Left truncated data are also accommodated leading to models for time-dependent covariates. Both georeferenced (location observed exactly) and areally observed (location known up to a geographic unit such as a county) spatial locations are handled. Variable selection is also incorporated. Model fit is assessed with conditional Cox-Snell residuals, and model choice carried out via LPML and DIC. Baseline survival is modeled with a novel transformed Bernstein polynomial prior. All models are fit via new functions which call efficient compiled C++ in the R package spBayesSurv. The methodology is broadly illustrated with simulations and real data applications. An important finding is that proportional odds and accelerated failure time models often fit significantly better than the commonly-used proportional hazards model.

    Website:

    Dr. Hanson's LinkedIn

  • Toggle Item
    2018-03-08 - Daniel Nettleton - Random Forest Prediction Intervals

    Presenter:

    Dr. Daniel Nettleton

    Title:

    Random Forest Prediction Intervals

    Affiliation:

    Iowa State University

    Date:

    Mar 8, 2018

    Abstract:

    Breiman's seminal paper on random forests has more than 30,000 citations according to Google Scholar. The impact of Breiman's random forests on machine learning, data analysis, data science, and science in general is difficult to measure but unquestionably substantial. The virtues of random forest methodology include no need to specify functional forms relating predictors to a response variable, capable performance for low-sample-size high-dimensional data, general prediction accuracy, easy parallelization, few tuning parameters, and applicability to a wide range of prediction problems with categorical or continuous responses. Like many algorithmic approaches to prediction, random forests are typically used to produce point predictions that are not accompanied by information about how far those predictions may be from true response values. From the statistical point of view, this is unacceptable; a key characteristic that distinguishes statistically rigorous approaches to prediction from others is the ability to provide quantifiably accurate assessments of prediction error from the same data used to generate point predictions. Thus, we develop a prediction interval -- based on a random forest prediction -- that gives a range of values that will contain an unknown continuous univariate response with any specified level of confidence. We illustrate our proposed approach to interval construction with examples and demonstrate its effectiveness relative to other approaches for interval construction using random forests.

    Website:

    Dr. Nettleton's website
  • Toggle Item
    2018-02-22 - Robert Richardson - Non-Gaussian Translation Processes

    Presenter:

    Robert Richardson

    Title:

    Non-Gaussian Translation Processes

    Affiliation:

    BYU

    Date:

    Feb 22, 2018

    Abstract:

    A non-Gaussian translation process is a method used in some engineering applications where a stochastic process is used with non-Gaussian marginal distributions. It could be considered a hierarchical copula model where the correlation structure of the process is defined separately from the marginal distributional characteristics. This approach also yields a simple likelihood function for the finite dimensional distributions of the stochastic process. These processes will be shown in a few applications to either perform tasks that could not be done previously or to do it much more efficiently such as non-Gaussian option pricing, general multivariate stable spatial processes, and non-Gaussian spatio-temporal dynamic modeling.

    Website:

    Dr. Richardson's Website
  • Toggle Item
    2018-02-15 - Jeffery Tessem - How to make more beta cells: exploring molecular pathways that increase functional beta cell mass as a cure for Type 1 and Type 2 diabetes

    Presenter:

    Dr. Jeffery S Tessem

    Title:

    How to make more beta cells: exploring molecular pathways that increase functional beta cell mass as a cure for Type 1 and Type 2 diabetes

    Affiliation:

    Department of Nutrition, Dietetics and Food Science at BYU

    Date:

    Feb 15, 2018

    Abstract:

    Both Type 1 (T1D) and Type 2 diabetes (T2D) are caused by a relative insufficiency in functional β-cell mass. Current therapeutic options for diabetes include daily insulin injections to maintain normoglycemia, pharmacological agents to stimulate β-cell function and enhance insulin sensitivity, or islet transplantation. A major obstacle to greater application of islet transplantation therapy is the scarcity of human islets. Thus, new methods for expansion of β-cell mass, applied in vitro to generate the large numbers of human islet cells needed for transplantation, or in situ to induce expansion of the patients remaining β-cells, could have broad therapeutic implications for this disease. To this end, our lab is interested in delineating the molecular pathways that increase β-cell proliferation, enhance glucose stimulated insulin secretion, and protect against β-cell death.

    Website:

    Dr. Tessem's Website
  • Toggle Item
    2018-02-08 - Chris Groendyke - Bayesian Inference for Contact Network Models using Epidemic Data

    Presenter:

    Chris Groendyke

    Title:

    Bayesian Inference for Contact Network Models using Epidemic

    Data

    Affiliation:

    Robert Morris University

    Date:

    Feb 8, 2018

    Abstract:

    I will discuss how network models can be used to study the spread of epidemics through a population, and in turn what epidemics can tell us about the structure of this population. I apply a Bayesian methodology to data from a disease presumed to have spread across a contact network in a population in order to perform inference on the parameters of the underlying network and disease models. Using a simulation study, I will discuss the strengths, weaknesses, and limitations of this type of these models, and the data required for this type of inference. Finally, I will describe an analysis of an actual measles epidemic that spread through the town of Hagelloch, Germany, in 1861 and share the conclusions it allows us to make regarding the population structure.

    Website:

    Chris's Website
  • Toggle Item
    2018-02-01 - Larry Baxter - Structure in Prior PDFs and Its Effect on Bayesian Analysis

    Presenter:

    Larry Baxter

    Title:

    Structure in Prior PDFs and Its Effect on Bayesian Analysis

    Affiliation:

    BYU

    Date:

    Feb 1, 2018

    Abstract:

    Bayesian statistics formalizes a procedure for combining established (prior) statistical knowledge with current knowledge to produce a posterior statistical description that presumably is better than either the prior or new knowledge by itself. Two common applications of this theory involve (a) combining established (literature) estimates of model parameter with new data to produce better parameter estimates, and (b) estimating model prediction confidence bands. Frequently, the prior information includes reasonable parameter estimates, poorly quantified and often subjective parameter uncertainty estimates, and no information regarding how the values of one parameter affect the confidence intervals of other parameters. All three of these parameter characteristics affect Bayesian analysis. The first two receive a great deal of attention. The third characteristic, the dependence of model parameters on one another, creates structure in the prior pdfs. This structure strongly influences Bayesian results, often to an extent that rivals or surpasses the parameter uncertainty best estimates. Nevertheless, Bayesian analyses commonly ignore this structure.
    All structure stems primarily from the form of the model and, in linear models, does not depend on the observations themselves. Most models produce correlated parameters when applied to real-world engineering and science data. The most common example of structure is parameter correlation coefficients. Linear models produce linear parameter correlations that depend on the magnitude of the independent variable under analysis but that in most practical applications produce large, often close to unity, correlation coefficients. Nonlinear models also generally have correlated parameters. However the correlations can be nonlinear, even discontinuous, and generally involve more complexity than linear model parameter correlations. Parameter correlations profoundly affect the results of Bayesian parameter estimation and prediction uncertainty. Properly incorporated structure produces Bayesian results that powerfully illustrate the strength and potential contribution of the theory. Bayesian analyses that ignore such structure produce poor or even nonsensical results, often significantly worse than a superficial guess.
    This seminar demonstrates the importance of prior structure in both parameter estimation and uncertainty quantification using real data from typical engineering systems. Perhaps most importantly, the discussion illustrates methods of incorporating parameter structure for any given model that does not rely on observations. These methods quantify parameter structure, including the lack of structure, for linear and nonlinear models.

    Website:

    Larry's Website
  • Toggle Item
    2018-01-18 - Brad Barney - Growing Curve Methodology with Application to Neonatal Growth Curves

    Presenter:

    Brad Barney

    Title:

    Growing Curve Methodology with Application to Neonatal Growth Curves

    Affiliation:

    BYU

    Date:

    Jan 18, 2018

    Abstract:

    As part of postnatal care, newborns are routinely monitored to assess the stability and adequacy of their growth. Interest lies in learning about the typical postnatal growth of especially preterm infants. We briefly consider some general methodological strategies currently employed to parsimoniously construct growth curves for use in medical practice. We present original results using existing methodology known as generalized additive models for location, scale and shape (GAMLSS). We also expand existing methodology on the Bayesian analogue of GAMLSS, known as structured additive distributional regression. In particular, we hierarchically model weight and length jointly, from which we are able to induce a time-varying distribution for Body Mass Index.

    Co-Authors:

    Adrienne Williamson, Josip Derado, Gregory Saunders, Irene Olsen, Reese Clark, Louise Lawson, Garritt Page, and Miguel de Carvalho

    Website:

    Brad's page