As a social scientist turned data scientist, my graduate school training taught me a lot of frequentist statistics that has served me well in my career since. However, there’s one thing that frequentist statistics can’t solve, and that’s my lack of a personality. For that, we’ve got Bayesian statistics, the perfect substitute.
I’ve spent the last year or so gradually trying to become more Bayesian in my approach to statistical reasoning and analysis, and while there’s quite a few really good resources for supporting that journey, it can be a little difficult to know where to start. I wanted someone to give me a short reading list that would save me trying to work out which books to use and in what order. In an attempt to fill that gap, I’ve put together a list of resources that I’ve come across and found to be pretty solid. I’ve not worked through each and every one of these cover to cover, but I’ve interacted with them enough to think they’re useful and that someone might be able to learn a lot from them.
Where possible I’ve tried to find stuff that is free and open-source, but that hasn’t been possible in every single case. But in each case I have at least found a freely available resource that is made available by the authors, whether it be the book itself, a video series, or a series of articles.
Becoming More Bayesian
These first resources are good starting points for anyone looking to start learning more about Bayesian statistics. They’ve all got plenty of detail, but they focus on making the intuition of Bayesianism a little more approachable.
I think the starting point for everybody, and an all-round great resource for anyone interested in statistics, is Richard McElreath’s Statistical Rethinking. If you read only one of these books, make it Statistical Rethinking. It’s a great read for anybody interested in statistics and quantitative research.
The focus of Statistical Rethinking is utilizing Bayesian statistics for scientific reasoning/thinking and causal inference. It’s a really great read, in part because it is really accessible while also being pretty thorough too. I particularly like his consistent focus on how these tools apply to scientific models.
Not only is it available as a book, but Richard also makes a lecture series available on YouTube which can serve as an accompaniment to the book, or can even stand on its own as a good learning resource.
Student’s Guide to Bayesian Statistics
Ben Lambert’s A Student’s Guide to Bayesian Statistics is a really good dive into the statistics that underpin Bayesian analysis, and the implementation of the statistical methods. It’s not light on notation, but it does introduce new concepts by first explaining the intuition before laying it out mathematically. I think this book pairs very well with Statistical Rethinking, as they take slightly different approaches, and between the two you cover tons of ground.
You can buy the book on Amazon but it isn’t freely available anywhere (that I’ve found). Ben does offer a free YouTube course and lecture notes that mirror the book’s content though. The YouTube course doesn’t offer the kind of detail that the book does, but it is still pretty good.
Introduction to Empirical Bayes
If you’re looking for something to help you apply Bayesian statistics to a real-world situation, or you’re the kind of learner that needs a relatable hook to help you make sense of things (this is me), then David Robinson’s Introduction to Empirical Bayes is the perfect starting point. David applies Empirical Bayes methods (Bayesian approaches where the prior is estimated from the data) in a baseball context, specifically simulating batting averages. Baseball isn’t really the focus of David’s work here, but it serves as a really good medium for couching some quite complicated concepts in terms that people can understand, and thanks to this (and David’s excellent communication of the concepts), it is a really good opening gambit for anyone interested in Bayesian statistics.
The Introduction to Empirical Bayes book started out as a series of blog posts on David’s website, Variance Explained, and those blog posts are all still freely available, but his e-book also includes an extra chapter, some additional materials across several other chapters, and some minor edits and changes to help simplify things. The book is also available at a “pay-what-you-want” price, which means you can choose to pay nothing, if you can’t afford to pay more, but for those that can, the suggested price is $9.95 (~£8).
I don’t think that David’s e-book/blog posts are the most thorough of the resources listed here, but they’re also not trying to be. Instead, they’re a really good plain(ish)-English explanation of Bayesian concepts, applied to examples that should help a lot of people intuit them a little easier. The use of Empirical Bayes serves as a good first step in the direction of Bayesian approaches, and it is also a good way to learn some really useful methods in real-world data analysis. If you find that some of the other resources are a little hard to wrap your head round, I’d read all of his posts (or buy his book if you can afford to) first, because it might help lay the groundwork for your learning journey.
The vast majority of the resources listed here use R for implementation. I think that’s because the packages for computing Bayesian models in R are much easier to navigate than their alternatives in Python. However, if you’re coming at this as someone with limited or no knowledge of R and no interest in changing that, then Allen Downey’s Think Bayes is worth a look. Think Bayes takes a code-first approach to teaching Bayesian concepts, rather than using notation and illustrating ideas using concepts like calculus. For anyone that uses Python and would learn better through concepts being explained through code, this book is a great start.
Building a Stronger Understanding
The following resources would serve as great next steps for someone looking to build on their understanding of Bayesianism, whether it be an extremely thorough exploration of the mathematics underpinning the methods, or detailed documentation of the code for the computation of Bayesian methods.
Bayesian Data Analysis
Everything Andrew Gelman does is good, and Bayesian Data Analysis (co-authored with John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, and Donald B. Rubin) is yet more proof of that. It’s a really extensive book covering the fundamentals of Bayesian inference, analysis, and methods.
I think Bayesian Data Analysis might be a little heavier going if you’re just starting out, but the book is also supplemented by an online course made available by Aki Vehtari, with lecture slides and videos to help bridge any gaps in understanding.
The book and the course are excellent resources for those looking to build on foundational knowledge (perhaps developed using the earlier resources). It’s probably the most thorough of all the resources here, so if you’re comfortable with mathematical notation and only want to read one book, this might be your best bet.
Stan is a probabilistic programming language for implementing Bayesian statistical models. It’s the language that underpins the computation of Bayesian methods, and there are interfaces that make it accessible in a number of different languages, including R, Python, and Julia (for those of you looking to intimidate your peers).
Finally, there are books that are worth a mention because they are really good resources that, while not focused entirely on Bayesian statistics, do a great job of placing Bayesian approaches in the context of wider statistical methods.
Regression and Other Stories
More Gelman. I have no regrets! Regression and Other Stories is co-authored with Jennifer Hill and Aki Vehtari, and it will serve you brilliantly if you’re looking to learn about Bayesian approaches while also covering the frequentist methods too. It is a book that focuses on the application of regression to real-world problems and the challenges users will face in the process.
While the focus of the book is regression, it incorporates Bayesian inference and methods across the book, and this might be useful to anyone wanting a broader education in statistical analysis that includes Bayesian approaches.
Probabilistic Machine Learning
It’s not specifically a Bayesian machine learning book, but probabilistic machine learning incorporates a lot of Bayesian principles, and there’s plenty of explicitly Bayesian tools in the book too. If you’ve cracked several of the resources in this list and are looking to take your next step, I think this would be a great direction.
I think that having a good understanding of Bayes rule is really important for anyone working in or around statistics, because it helps inform the way you reason about statistics, and it helps you interrogate your approach to analysis in a more thorough manner. Getting a good intuition for Bayesian thinking is relatively simple, but if you’re looking to go a little further, it does require a bit of work. I think there are tons of really good resources out there to take those next steps, but I don’t think they’re as easy to track down as they could be. Hopefully this list will bridge that gap for a few people.