Preface
This book is based on the notes we created for our students as part of a one semester course on probability and statistics. We developed these notes from three primary resources. The most important is the Openintro Introductory Statistics with Randomization and Simulation (Diez, Barr, and Çetinkaya-Rundel 2014) book. In parts, we have used their notes and homework problems. However, in most cases we have altered their work to fit our needs. The second most important book for our work is Introduction to Probability and Statistics Using R (Kerns 2010). Finally, we have used some examples, code, and ideas from the first addition of Prium’s book Foundations and Applications of Statistics: An Introduction Using R (R. J. Pruim 2011).
0.1 Who is this book for?
We designed this book for study of statistics that maximizes computational ideas while minimizing algebraic symbol manipulation. Although we do discuss traditional small-sample, normal-based inference and some of the classical probability distributions, we rely heavily on ideas such as simulation, permutations, and bootstrap. This means that students with a background in differential and integral calculus will be successful with this book.
The book makes extensive using of the R
programming language. In particular we focus both on the tidyverse and mosaic packages. We include a significant amount of code in our notes and frequently demonstrate multiple ways of completing a task. We have used this book for juniors and sophomores.
0.2 Book structure and how to use it
This book is divided into 4 parts. Each part starts with a case study that introduces many of the main ideas of each part. Each chapter is designed to be a standalone 50 minute lesson. Within each lesson, we give exercises that can be worked in class and we provide learning objectives.
This book assumes students have access to R
. Finally, we keep the number of homework problems to a reasonable level and assign all problems.
The four parts of the book are:
Descriptive Statistical Modeling: This part introduces the student to data collection methods, summary statistics, visual summaries, and exploratory data analysis.
Probability: We discuss the foundational ideas of probability, counting methods, and common distributions. We use both calculus and simulation to find moments and probabilities. We introduce basic ideas of multivariate probability. We include method of moments and maximum likelihood estimators.
Statistical Inference: We discuss many of the basic inference ideas found in a traditional introductory statistics class but we add ideas of bootstrap and permutation methods.
Statistical Prediction: The final part introduces prediction methods mainly in the form of linear regression. This part also includes inference for regression.
The learning outcomes for this course are to use computational and mathematical statistical/probabilistic concepts for:
- Developing probabilistic models
- Developing statistical models for description, inference, and prediction
- Advancing practical and theoretical analytic experience and skills
0.3 Prerequisites
To take this course, students are expected to have completed calculus up through and including integral calculus. We do have multivariate ideas in the course but they are easily taught and don’t require calculus III. We don’t assume the students have any programming experience and thus, we include a great deal of code. We have historically supplemented the course with Data Camp courses. We have also used RStudio Cloud to help students get started without the burden of loading and maintaining software.
0.4 Packages
These notes make use of the following packages in R
: knitr (Xie 2021b), rmarkdown (Allaire et al. 2021), mosaic (R. Pruim, Kaplan, and Horton 2021), mosaicCalc (Kaplan, Pruim, and Horton 2020), tidyverse (Wickham 2021), ISLR (James et al. 2017), vcd (Meyer, Zeileis, and Hornik 2020), ggplot2 (Wickham et al. 2021), MASS (Ripley 2021), openintro (Çetinkaya-Rundel et al. 2021), broom (Robinson, Hayes, and Couch 2021), infer (Bray et al. 2021), kableExtra (Zhu 2021), and DT (Xie, Cheng, and Tan 2021).
0.5 Acknowledgements
We have been lucky to have numerous open sources to help facilitate this work. Thank you to those who helped to correct mistakes to include Skyler Royse.
This book was written using the bookdown package (Xie 2021a).
This book is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.