Stan, Scalable Software for Bayesian Modeling

Awardees

Andrew Gelman
Higgins Professor of Statistics and Professor of Political Science

$499,637

This award is to design, code, document, test, dissememinate, and maintain Stan, an extensible open-source software framework and compiler for efficient and scalable Bayesian statistical modeling. Stan is an extensible, open-source, cross-platform software framework for developing Bayesian statistical models. The first step in Bayesian modeling is setting up a full probability model for all quantities of interest. Stan facilitates this process by providing an expressive and extensible domain-specific programming language for specifying probabilistic models. By compiling a model specification into executable code, Stan fully automates the second step of Bayesian inference, calculating the probabilities of unobserved quantities, such as model parameters and future observations, conditional on observed data. The third step involves evaluating the fit of the model to the data and its predictions for unseen data. When the model is easy to encode and inferences are fast and automatic to compute, it is easy to iterate the specification, fit and evaluation steps in order to refine the scientific model. Stan improves on the existing state of the art in both algorithmic and implementation details. Rather than being interpreted on the fly like its predecessors, Stan models are compiled to C++ code, which dramatically improves both scalability and efficiency. Stan provides a full algorithmic differentiation library for the functions required for statistical modeling. This method applies the chain rule from calculus to the program computing the probability function in order to calculate derivatives efficiently and accurately (a small multiple of the time taken to compute the function, independently of dimensionality). This allows Stan to fully automate the model fitting stage given only a specification of the probability function in Stan's modeling language. To maximize Stan's accessibility to the scientific community, it is being coded using standards-compliant C++, so that it will run under Windows, Macintosh, and Unix/Linux. To make running Stan even easier, it is callable from R, MATLAB, and Python, the three most popular platforms for numerical analysis, including exploration and plotting.