Seminars

2018-11-29 - Bruno Sanso - Multi-Scale Models for Large Non-Stationary Spatial Datasets

Presenter:

Bruno Sanso

Title:

Multi-Scale Models for Large Non-Stationary Spatial Datasets

Affiliation:

University of California Santa Cruz

Date:

November 29, 2018

Abstract:

Large spatial datasets often exhibit features that vary at different scales as well as at different locations. To model random fields whose variability changes at differing scales we use multiscale kernel convolution models. These models rely on nested grids of knots at different resolutions. Thus, lower order terms capture large scale features, while high order terms capture small scale ones. In this talk we consider two approaches to fitting multi-resolution models with space-varying characteristics. In the first approach, to accommodate the space-varying nature of the variability, we consider priors for the coefficients of the kernel expansion that are structured to provide increasing shrinkage as the resolution grows. Moreover, a tree shrinkage prior auto-tunes the degree of resolution necessary to model a subregion in the domain. In addition, compactly supported kernel functions allow local updating of the model parameters which achieves massive scalability by suitable parallelization. As an alternative, we develop an approach that relies on knot selection, rather than shrinkage, to achieve parsimony, and discuss how this induces a field with spatially varying resolution. We extend shotgun stochastic search to the multi resolution model setting, and demonstrate that this method is computationally competitive and produces excellent fit to both synthetic and real dataset.

Website:

Dr. Sanso's Website

2018-11-15 - Margie Rosenberg - Unsupervised Clustering Techniques using all Categorical Variables

Presenter:

Margie Rosenberg

Title:

Unsupervised Clustering Techniques using all Categorical Variables

Affiliation:

University of Wisconsin-Madison

Date:

November 15, 2018

Abstract:

We present a case study to illustrate a novel way of clustering individuals to create groups of similar individuals where covariates are all categorical. Our method is especially useful when applied to multi-level categorical data where there is no inherent order in the variable, like race. We use data from the National Health Interview Survey (NHIS) to form the clusters and apply these clusters for prediction purposes to the Medical Expenditures Panel Study (MEPS). Our approach considers the person-weighting of the surveys to produce clusters and estimates of expenditures per cluster that are representative of the US adult civilian non-institutionalized population. For our clustering method, we apply the K-Medoids approach with an adapted version of the Goodall dissimilarity index. We validate our approach on independent NHIS/MEPS data from a different panel. Our results indicate the robustness of the clusters across years and indicate the ability to distinguish clusters for the predictability of expenditures.

Website:

Dr. Rosenberg's Website

2018-11-08 - Terrance Savitsky - Bayesian Uncertainty Estimation under Complex Sampling

Presenter:

Terrance Savitsky

Title:

Bayesian Uncertainty Estimation under Complex Sampling

Affiliation:

Bureau of Labor Statistics

Date:

November 8, 2018

Abstract:

Multistage, unequal probability sampling designs utilized by federal statistical agencies are typically constructed to maximize the efficiency of the target domain level estimator (e.g., indexed by geographic area) within cost constraints for survey administration. Such designs may induce dependence between the sampled units; for example, with employment of a sampling step that selects geographically-indexed clusters of units. A sampling-weighted pseudo-posterior distribution may be used to estimate the population model on the observed sample. The dependence induced between co-clustered units inflates the scale of the resulting pseudo-posterior covariance matrix that has been shown to induce under coverage of the credibility sets. We demonstrate that the scale and shape of the asymptotic distributions are different between each of the pseudo-MLE, the pseudo-posterior and the MLE under simple random sampling. We devise a correction applied as a simple and fast post-processing step to MCMC draws of the pseudo-posterior distribution that projects the pseudo-posterior covariance matrix such that the nominal coverage is approximately achieved. We demonstrate the efficacy of our scale and shape projection procedure on synthetic data and make an application to the National Survey on Drug Use and Health.

Website:

2018-11-01 - Dustin Harding - How Renting Products Increases Consumer Confidence and Commitment

Presenter:

Dustin Harding

Title:

How Renting Products Increases Consumer Confidence and Commitment

Affiliation:

UVU

Date:

October 25, 2018

Abstract:

Consumers can obtain skill-based products through a variety of acquisition modes, such as purchase or rental. Despite the rise of nonpurchase acquisition modes, surprisingly little research has explored the effects of differential acquisition modes on consumer behavior. This research begins to fill this gap in the literature by examining the effect of acquisition mode on the expected time necessary to master newly adopted skill-based products and the downstream consequences for consumers and marketers. Results of four experiments and a field study show that purchasing, versus renting, products requiring skill-based learning increases the amount of time consumers expect to be required to master them. Further, the differences in speed of product mastery, in turn, impact subsequent consumer behavior via differential levels of product use commitment.

Website:

Dr. Harding's Website

2018-10-25 - Alex Petersen - Wasserstein Regression and Covariance for Random Densities

Presenter:

Alex Petersen

Title:

Wasserstein Regression and Covariance for Random Densities

Affiliation:

UC Santa Barbara

Date:

October 25, 2018

Abstract:

Samples of density functions appear in a variety of disciplines, including distributions of mortality across nations, CT density histograms of hematoma in post-stroke patients, and distributions of voxel-to-voxel correlations of fMRI signals across subjects. The nonlinear nature of density space necessitates adaptations and new methodologies for the analysis of random densities. We define our geometry using the Wasserstein metric, an increasingly popular choice in theory and application. First, when densities appear as responses in a regression model, the utility of Fréchet regression, a general purpose methodology for response objects in a metric space, is demonstrated. Due to the manifold structure of the space, inferential methods are developed allowing for tests of global and partial effects, as well as simultaneous confidence bands for fitted densities. Second, a notion of Wasserstein covariance is proposed for multivariate density data (a vector of densities), where multiple densities are observed for each subject. This interpretable dependence measure is shown to reveal interesting differences in functional connectivity between a group of Alzheimer's subjects and a control group.

Website:

Dr. Petersen's Website

2018-10-18 - Abel Rodriguez - Spherical Factor Analysis for Binary Data: A Look at the Conservative Revolt in the US House of Representatives

Presenter:

Abel Rodriguez

Title:

Spherical Factor Analysis for Binary Data: A Look at the Conservative Revolt in the US House of Representatives

Affiliation:

UC Santa Cruz

Date:

October 18, 2018

Abstract:

Factors models for binary data are extremely common in many social science disciplines. For example, in political science binary factor models are often used to explain voting patterns in deliberative bodies such as the US Congress, leading to an “ideological” ranking of legislators. Binary factor models can be motivated through so-call “spatial” voting models, which posit that legislators have a most preferred policy – their ideal point –, which can be represented as a point in some Euclidean “policy space”. Legislators then vote for/against motions in accordance with the distance between their (latent) preferences and the position of the bill in the same policy space. In this talk we introduce a novel class of binary factor models derived from spatial voting models in which the policy space corresponds to a non-Euclidean manifold. In particular, we consider embedding legislator’s preferences in the surface of a n-dimensional sphere. The resulting model contains the standard binary Euclidean factor model as a limiting case, and provides a mechanism to operationalize (and extend) the so-called “horseshoe theory” in political science, which postulates that the far-left and far-right are more similar to each other in essentials than either are to the political center. The performance of the model is illustrated using voting data from recent US Congresses. In particular, we show that voting patterns for the 113th US House of Representatives are better explained by a circular factor model than by either a one- or a two-dimensional Euclidean model, and that the circular model yields a ranking of legislators more in accord with expert’s expectations.

Website:

Dr. Rodriguez's Website

2018-09-20 - Scott Grimshaw - Going Viral, Binge Watching, and Attention Cannibalism

Presenter:

Dr. Scott Grimshaw

Title:

Going Viral, Binge Watching, and Attention Cannibalism

Affiliation:

BYU

Date:

September 20, 2018

Abstract:

Since digital entertainment is often described as viral this paper uses the vocabulary and statistical methods for diseases to analyze viewer data from an experiment at BYUtv where a program's premiere was exclusively digital. Onset time, the days from the program premiere to a viewer watching the first episode, is modeled using a changepoint between epidemic viewing with a non-constant hazard rate and endemic viewing with a constant hazard rate. Finish time, the days from onset to a viewer watching all episodes, uses an expanded negative binomial hurdle model to reflect the characteristics of binge watching. The hurdle component models binge racing where a viewer watches all episodes on the same day as onset. One reason binge watching appeals to viewers is that they can focus attention on a single program's story line and characters before moving on to a second program. This translates to a competing risks model that has an impact on scheduling digital premieres. Attention cannibalism occurs when a viewer takes a long time watching their first choice program and then never watches a second program or delays watching the second program until much later. Scheduling a difference in premieres reduces attention cannibalism.

Website:

Dr. Grimshaw's website

2018-04-12 - Cristian Tomasetti - Cancer etiology, evolution and early detection

Presenter:

Dr. Cristian Tomasetti

Title:

Cancer etiology, evolution, and early detection

Affiliation:

Johns Hopkins University School of Medicine

Date:

Apr 12, 2018

Abstract:

The standard paradigm in cancer etiology is that inherited factors and lifestyle, environmental exposures are the causes of cancer. I will present recent findings indicating that a third cause, never considered before, plays a large role: "bad luck", i.e. the pure chance involved in DNA replication when cells divide. Novel mathematical and statistical methodologies for distinguishing among these causes will also be introduced. I will then conclude with a new approach for the early detection of cancer.

Website:

Dr. Tomasetti's Website

2018-03-29 - H. Dennis Tolley - What's the Likelihood?

Presenter:

H. Dennis Tolley

Title:

What's the Likelihood?

Affiliation:

BYU

Date:

Mar 29, 2018

Abstract:

The likelihood function plays a major role in both frequentist and Bayesian methods of data analysis. Non-parametric Bayesian models also rely heavily on the form of the likelihood. Despite its heuristic foundation, the likelihood has several desirable large sample statistical properties that prompt its use among frequentists. Additionally, there are other important facets of the likelihood that warrant its formulation in many circumstances. As fundamental as the likelihood is, however, beginning students are only given a cursory introduction into how to formulate the likelihood. This seminar illustrates the formulation of the likelihood for a family of statistical problems common in the physical sciences. By examining the basic scientific principles associated with an experimental set-up, we show the step by step construction of the likelihood, starting with the discrete random walk model as a paradigm. The resulting likelihood is the solution to a stochastic differential equation. Elementary applications of the likelihood are illustrated.

Website:

Dr. Tolley's website

2018-03-22 - Matthew Heaton - Methods for Analyzing Large Spatial Data: A Review and Comparison

Presenter:

Dr. Matthew Heaton

Title:

Methods for Analyzing Large Spatial Data: A Review and Comparison

Affiliation:

BYU

Date:

Mar 22, 2018

Abstract:

The Gaussian process is an indispensable tool for spatial data analysts. The onset of the “big data” era, however, has lead to the traditional Gaussian process being computationally infeasible for modern spatial data. As such, various alternatives to the full Gaussian process that are more amenable to handling big spatial data have been proposed. These modern methods often exploit low rank structures and/or multi-core and multi-threaded computing environments to facilitate computation. This study provides, first, an introductory overview of several methods for analyzing large spatial data. Second, this study describes the results of a predictive competition among the described methods as implemented by different groups with strong expertise in the methodology. Specifically, each research group was provided with two training datasets (one simulated and one observed) along with a set of prediction locations. Each group then wrote their own implementation of their method to produce predictions at the given location and each which was subsequently run on a common computing environment. The methods were then compared in terms of various predictive diagnostics.

Website:

Dr. Heaton's website

2018-03-15 - Timothy Hanson - A unified framework for fitting Bayesian semiparametric models to arbitrarily censored spatial survival data

Presenter:

Timothy Hanson

Title:

A unified framework for fitting Bayesian semiparametric models to arbitrarily censored spatial survival data

Affiliation:

Medtronic

Date:

Mar 15, 2018

Abstract:

A comprehensive, unified approach to modeling arbitrarily censored spatial survival data is presented for the three most commonly-used semiparametric models: proportional hazards, proportional odds, and accelerated failure time. Unlike many other approaches, all manner of censored survival times are simultaneously accommodated including uncensored, interval censored, current-status, left and right censored, and mixtures of these. Left truncated data are also accommodated leading to models for time-dependent covariates. Both georeferenced (location observed exactly) and areally observed (location known up to a geographic unit such as a county) spatial locations are handled. Variable selection is also incorporated. Model fit is assessed with conditional Cox-Snell residuals, and model choice carried out via LPML and DIC. Baseline survival is modeled with a novel transformed Bernstein polynomial prior. All models are fit via new functions which call efficient compiled C++ in the R package spBayesSurv. The methodology is broadly illustrated with simulations and real data applications. An important finding is that proportional odds and accelerated failure time models often fit significantly better than the commonly-used proportional hazards model.

Website:

Dr. Hanson's LinkedIn

2018-03-08 - Daniel Nettleton - Random Forest Prediction Intervals

Presenter:

Dr. Daniel Nettleton

Title:

Random Forest Prediction Intervals

Affiliation:

Iowa State University

Date:

Mar 8, 2018

Abstract:

Breiman's seminal paper on random forests has more than 30,000 citations according to Google Scholar. The impact of Breiman's random forests on machine learning, data analysis, data science, and science in general is difficult to measure but unquestionably substantial. The virtues of random forest methodology include no need to specify functional forms relating predictors to a response variable, capable performance for low-sample-size high-dimensional data, general prediction accuracy, easy parallelization, few tuning parameters, and applicability to a wide range of prediction problems with categorical or continuous responses. Like many algorithmic approaches to prediction, random forests are typically used to produce point predictions that are not accompanied by information about how far those predictions may be from true response values. From the statistical point of view, this is unacceptable; a key characteristic that distinguishes statistically rigorous approaches to prediction from others is the ability to provide quantifiably accurate assessments of prediction error from the same data used to generate point predictions. Thus, we develop a prediction interval -- based on a random forest prediction -- that gives a range of values that will contain an unknown continuous univariate response with any specified level of confidence. We illustrate our proposed approach to interval construction with examples and demonstrate its effectiveness relative to other approaches for interval construction using random forests.

Website:

Dr. Nettleton's website

2018-02-22 - Robert Richardson - Non-Gaussian Translation Processes

Presenter:

Robert Richardson

Title:

Non-Gaussian Translation Processes

Affiliation:

BYU

Date:

Feb 22, 2018

Abstract:

A non-Gaussian translation process is a method used in some engineering applications where a stochastic process is used with non-Gaussian marginal distributions. It could be considered a hierarchical copula model where the correlation structure of the process is defined separately from the marginal distributional characteristics. This approach also yields a simple likelihood function for the finite dimensional distributions of the stochastic process. These processes will be shown in a few applications to either perform tasks that could not be done previously or to do it much more efficiently such as non-Gaussian option pricing, general multivariate stable spatial processes, and non-Gaussian spatio-temporal dynamic modeling.

Website:

Dr. Richardson's Website

2018-02-15 - Jeffery Tessem - How to make more beta cells: exploring molecular pathways that increase functional beta cell mass as a cure for Type 1 and Type 2 diabetes

Presenter:

Dr. Jeffery S Tessem

Title:

How to make more beta cells: exploring molecular pathways that increase functional beta cell mass as a cure for Type 1 and Type 2 diabetes

Affiliation:

Department of Nutrition, Dietetics and Food Science at BYU

Date:

Feb 15, 2018

Abstract:

Both Type 1 (T1D) and Type 2 diabetes (T2D) are caused by a relative insufficiency in functional β-cell mass. Current therapeutic options for diabetes include daily insulin injections to maintain normoglycemia, pharmacological agents to stimulate β-cell function and enhance insulin sensitivity, or islet transplantation. A major obstacle to greater application of islet transplantation therapy is the scarcity of human islets. Thus, new methods for expansion of β-cell mass, applied in vitro to generate the large numbers of human islet cells needed for transplantation, or in situ to induce expansion of the patients remaining β-cells, could have broad therapeutic implications for this disease. To this end, our lab is interested in delineating the molecular pathways that increase β-cell proliferation, enhance glucose stimulated insulin secretion, and protect against β-cell death.

Website:

Dr. Tessem's Website

2018-02-08 - Chris Groendyke - Bayesian Inference for Contact Network Models using Epidemic Data

Presenter:

Chris Groendyke

Title:

Bayesian Inference for Contact Network Models using Epidemic Data

Affiliation:

Robert Morris University

Date:

Feb 8, 2018

Abstract:

I will discuss how network models can be used to study the spread of epidemics through a population, and in turn what epidemics can tell us about the structure of this population. I apply a Bayesian methodology to data from a disease presumed to have spread across a contact network in a population in order to perform inference on the parameters of the underlying network and disease models. Using a simulation study, I will discuss the strengths, weaknesses, and limitations of this type of these models, and the data required for this type of inference. Finally, I will describe an analysis of an actual measles epidemic that spread through the town of Hagelloch, Germany, in 1861 and share the conclusions it allows us to make regarding the population structure.

Website:

Chris's Website

2018-02-01 - Larry Baxter - Structure in Prior PDFs and Its Effect on Bayesian Analysis

Presenter:

Larry Baxter

Title:

Structure in Prior PDFs and Its Effect on Bayesian Analysis

Affiliation:

BYU

Date:

Feb 1, 2018

Abstract:

Bayesian statistics formalizes a procedure for combining established (prior) statistical knowledge with current knowledge to produce a posterior statistical description that presumably is better than either the prior or new knowledge by itself. Two common applications of this theory involve (a) combining established (literature) estimates of model parameter with new data to produce better parameter estimates, and (b) estimating model prediction confidence bands. Frequently, the prior information includes reasonable parameter estimates, poorly quantified and often subjective parameter uncertainty estimates, and no information regarding how the values of one parameter affect the confidence intervals of other parameters. All three of these parameter characteristics affect Bayesian analysis. The first two receive a great deal of attention. The third characteristic, the dependence of model parameters on one another, creates structure in the prior pdfs. This structure strongly influences Bayesian results, often to an extent that rivals or surpasses the parameter uncertainty best estimates. Nevertheless, Bayesian analyses commonly ignore this structure.

All structure stems primarily from the form of the model and, in linear models, does not depend on the observations themselves. Most models produce correlated parameters when applied to real-world engineering and science data. The most common example of structure is parameter correlation coefficients. Linear models produce linear parameter correlations that depend on the magnitude of the independent variable under analysis but that in most practical applications produce large, often close to unity, correlation coefficients. Nonlinear models also generally have correlated parameters. However the correlations can be nonlinear, even discontinuous, and generally involve more complexity than linear model parameter correlations. Parameter correlations profoundly affect the results of Bayesian parameter estimation and prediction uncertainty. Properly incorporated structure produces Bayesian results that powerfully illustrate the strength and potential contribution of the theory. Bayesian analyses that ignore such structure produce poor or even nonsensical results, often significantly worse than a superficial guess.

This seminar demonstrates the importance of prior structure in both parameter estimation and uncertainty quantification using real data from typical engineering systems. Perhaps most importantly, the discussion illustrates methods of incorporating parameter structure for any given model that does not rely on observations. These methods quantify parameter structure, including the lack of structure, for linear and nonlinear models.

Website:

Larry's Website

Pages