Data Analytics

Skip Sub Menu

Research & Publications

Data Analytics faculty are not only accomplished educators, but they are recognized experts in their field. Their research covers topics in the areas of statistical methods such as Bayesian methods, analytics in business and education, and other areas of data analytics. The faculty in this area regularly publish in academic journals as well as collaborate with students in areas of undergraduate research.

Recent publications include:

Strader, T., Rozycki, J., Root, T., and Huang, Y. (2020). Machine learning stock market prediction studies: Review and research directionsJournal of International Technology and Information Management, 28(4). 

Stock market investment strategies are complex and rely on an evaluation of vast amounts of data. In recent years, machine learning techniques have increasingly been examined to assess whether they can improve market forecasting when compared with traditional approaches. The objective for this study is to identify directions for future machine learning stock market prediction research based upon a review of current literature. A systematic literature review methodology is used to identify relevant peer-reviewed journal articles from the past twenty years and categorize studies that have similar methods and contexts. Four categories emerge: artificial neural network studies, support vector machine studies, studies using genetic algorithms combined with other techniques, and studies using hybrid or other artificial intelligence approaches. Studies in each category are reviewed to identify common findings, unique findings, limitations, and areas that need further investigation. The final section provides overall conclusions and directions for future research.

Wilcoxson, J., Follett, L. and Severe, S. (2020). Forecasting foreign exchange markets using Google Trends: Prediction performance of competing modelsJournal of Behavioral Finance

Foreign exchange markets affect a variety of humans and businesses worldwide and there is a wide array of literature aimed at providing more accurate forecasts of their movement. In an attempt to quantify human expectations, Google query search terms related to foreign exchange markets are used to help explain and predict foreign exchange rates between the United States’ dollar and ten other currencies during the time period of January 2004 and August 2018. We find evidence that, while Google Trends can be helpful in prediction, it is necessary to implement some sort of shrinkage or sparsity scheme on the coefficients.

Follett, L. and Vander Naald, B. (2020). Explaining variability in tourist preferences: A Bayesian model well suited to small samples. Tourism Management, 78

Discrete choice experiments are becoming more popular in the tourism and travel literature. While Bayesian methods to analyze discrete choice experiment data have been used in other disciplines, they have not been used in the tourism literature. In this article, we develop a Bayesian Mixed Logit Model in which we use a little known prior distribution developed by Lewandowski, Kurowicka, and Joe (LKJ) and half Cauchy distributions as an alternative to the more traditionally used inverse Wishart distribution as a prior scheme for the covariance matrix of random parameters in mixed logit estimation. Using multiple simulated data sets, we show that use of the LKJ prior scheme improves the estimation of coefficients, especially for small data sets. Finally, we test the model with an actual small discrete choice data set examining tourist preferences for reducing glacier recession, and discuss the implications of the model for research and policy.

Henderson, H. and Follett, L. (2020). A Bayesian framework for estimating human capabilitiesWorld Development, 129

The capabilities approach provides a rich framework for welfare assessment, but its practical relevance is limited by methodological difficulties associated with the measurement of human capabilities. We argue that, unlike existing approaches to capability estimation, Bayesian stochastic frontier analysis (BSFA) is consistent with the key features of the capabilities approach and thus provides a natural framework for estimating capabilities. Using simulated data, we show that BSFA outperforms the leading alternatives (e.g., structural equation models) in comparable settings. We further show that our approach is more flexible than the alternatives: BSFA can provide cardinal representations of entire capability sets and can be used with continuous, discrete, and multivariate outcomes. Finally, we provide an empirical illustration of our estimator by examining the impact of Uganda’s Youth Opportunities Program on the educational capabilities of children in the treated households.

Follett, L. and Yu, C. (2019). Achieving parsimony in Bayesian vector autoregressions with the horseshoe priorEconometrics and Statistics, 11, 130-144. 

In the context of a vector autoregression (VAR) model, or any multivariate regression model, the number of relevant predictors may be small relative to the information set that is available. It is well known that forecasts based on (un-penalized) least squares estimates can overfit the data and lead to poor predictions. Since the Minnesota prior was proposed, there have been many methods developed aiming at improving prediction performance. The horseshoe prior is proposed in the context of a Bayesian VAR. The horseshoe prior is a unique shrinkage prior scheme in that it shrinks irrelevant signals rigorously to 0 while allowing large signals to remain large and practically unshrunk. In an empirical study, it is shown that the horseshoe prior competes favorably with shrinkage schemes commonly used in Bayesian VAR models as well as with a prior that imposes true sparsity in the coefficient vector. Additionally, the use of particle Gibbs with backwards simulation is proposed for the estimation of the time-varying volatility parameters. A detailed description of relevant MCMC methods is provided in the supplementary material.

Follett, L., Geletta, S., and Laugerman, M. (2019). Quantifying risk associated with clinical trial termination: A text mining approach. Journal of Information Processing & Management. 56(3), 516-525.

Clinical trials that terminate prematurely without reaching conclusions raise financial, ethical, and scientific concerns. Scientific studies in all disciplines are initiated with extensive planning and deliberation, often by a team of highly trained scientists. To assure that the quality, integrity, and feasibility of funded research projects meet the required standards, research-funding agencies such as the National Institute of Health and the National Science Foundation, pass proposed research plans through a rigorous peer review process before making funding decisions. Yet, some study proposals successfully pass through all the rigorous scrutiny of the scientific peer review process, but the proposed investigations end up being terminated before yielding results. This study demonstrates an algorithm that quantifies the risk associated with a study being terminated based on the analysis of patterns in the language used to describe the study prior to its implementation. To quantify the risk of termination, we use data from the clinicialTrials.gov repository, from which we extracted structured data that flagged study characteristics, and unstructured text data that described the study goals, objectives and methods in a standard narrative form. We propose an algorithm to extract distinctive words from this unstructured text data that are most frequently used to describe trials that were completed successfully vs. those that were terminated. Binary variables indicating the presence of these distinctive words in trial proposals are used as input in a random forest, along with standard structured data fields. In this paper, we demonstrate that this combined modeling approach yields robust predictive probabilities in terms of both sensitivity (0.56) and specificity (0.71), relative to a model that utilizes the structured data alone (sensitivity = 0.03, specificity = 0.97). These predictive probabilities can be applied to make judgements about a trial's feasibility using information that is available before any funding is granted.

Strader, T. and Bryant, A. (2018). University opportunities, abilities and motivations to create data analytics programs. Journal of the Midwest Association for Information Systems, 1, 37-48.

Some US colleges and universities have developed undergraduate and graduate data analytics programs in the past five years, but not all universities appear to have sufficient resources and incentives to venture into this multidisciplinary academic area.  The purpose of this study is to identify the characteristics of schools that have developed data analytics programs.  The study utilizes the motivation-ability-opportunity (MAO) theoretical framework to identify factors that increase the likelihood that a university will develop a data analytics program.  An analysis of 391 regional master’s universities in the US finds that schools with data analytics programs are more likely to be in larger cities and have larger student enrollments, better educational quality rankings, and an existing statistics and/or actuarial science program.  These findings support the idea that data analytics programs are more likely to be created when universities have opportunities to access a larger number of businesses and governmental organizations, and sufficient resources to support program development, while also having abilities associated with innovation and faculty resources.  Preliminary results also indicate that there are two motivations –need to increase student enrollment and need to maintain an up-to-date curriculum.

CBPA News