Taken literally, the title "All of Statistics" is an exaggeration. But in spirit, the title is apt, as the book does cover a much broader range of topics than a typical introductory book on mathematical statistics. This book is for people who want to learn probability and statistics quickly. It is suitable for graduate or advanced undergraduate students in computer science, mathematics, statistics, and related disciplines. The book includes modern topics like non-parametric curve estimation, bootstrapping, and classification, topics that are usually relegated to follow-up courses. The reader is presumed to know calculus and a little linear algebra. No previous knowledge of probability and statistics is required. Statistics, data mining, and machine learning are all concerned with collecting and analysing data.
Larry A. Wasserman is a Canadian statistician and a professor in the Department of Statistics and the Machine Learning Department at Carnegie Mellon University.
Notes: - The core problem in probability is "given a generating process, what does the output look like?" - The core problem in statistics is "given some output, what does the generating process look like?" - If A and B are disjoint events with non-zero probability, then they cannot be independent (because P(AB)=0, but P(A),P(B)>0). "Except in this special case, there is no way to judge independence by looking at the sets in a Venn diagram." - Mistaking P(A|B) for P(B|A) is called the prosecutor's fallacy. - The rule that "P(B) = sum_i P(B|A_i) if {A_i} is a partition" is called the Law of Total Probability. - I don't understand this: you can't generally assign probabilities to all subsets in a sample space, so attention is restricted to sigma-fields. - A geometric distribution is of the form P(X=k)=p(1-p)^{k-1}. Why is this called geometric? Because it's a geometric sequence (see here for more details). - The rate of the sum of two Poisson distributions is the sum of the rates. That is, if X1 ~ Poisson(n,lambda1), and X2 ~ Poisson(n,lambda2), then X1+x2 ~ Poisson(n,lambda1+lambda2). - The mathematical construction of a random variable is a mapping from sample space Omega to R. Just like in computers. - Standard normal distribution is denoted Z, with pdf and cdf denoted phi(z) and Phi(z). - There is no closed form of Phi(z). - If Xi ~ N(mi, si^2), then sum Xi ~ N(sum mi, sum si^2) - The logic of the gamma distribution, cauchy, and X^2 distributions continue to elude me. - Cauchy distribution is like Gaussian, but with thicker tails. It's a special case of t-distribution. - The multinomial distributions has binomial distributions as marginal distributions. - How to find the pdf of a transformation x -> y of a random variable: 1) find the pre-image for each y, 2) evaluate the CDF of the pre-image, 3) differentiate the CDF to get the PDF.
Very good reference on notions on probability, statistics and machine learning. Not ideal to learn the matter from scratch, but ideal to refresh and supplement your knowledge when you do a PhD.
From the title, one expects this book to be comprehensive and encyclopedic, but I found the opposite to be the case. This is a very mathematical rapid-survey of statistics which does not explain how to actually do any of the things that a working engineer or scientist would need to do.
I think the audience of this book is "mathematicians who find books with more equations than text to be comfortable and easy to learn from, who also know nothing about statistics and want a quick survey of the field, and who will use statistics to prove theorems and write papers instead of actually calculating anything." This book is completely unsuitable for engineers; for those I would recommend Baclawski and then Diez. Even Casella&Berger is much more accessible than this book.
This could possibly be useful as a reference book. Otherwise, it's math without any explanations, unless you find symbol manipulation explanatory. I'm not afraid of math (I minored in it and have a degree in CS), but I don't understand a formula without first understanding the concepts behind it. I expect this is true of most people. It is pretty funny to me that this book is billed as 'for people who want to learn probability and statistics quickly... No previous knowledge of probability and statistics is required.'
I learnt Statistics for 2 - 3 times in campus, but I still find this book is too hard, not suitable for beginner, some of the symbols in the theorem come from nowhere, and some of the definition needs further explanation. I can understand until chapter 7, but the symbols already beyond I can remember or understand.
Great for a high level overview of a very broad space (i.e. all of modern statistics). I definitely skimmed most of the equations, and some chapters, but still got a lot out of it as a refresher with good intuition, connecting statistics and computer science well.
Quotes - "The basic problem that we study in probability is: Given a data generating process, what are the properties of the outcomes?... the basic problem of statistical inference is the inverse of probability: given the outcomes, what can we say about the process that generated the data?" - "many inferential problems can be identified as being one of three types: estimation, confidence sets, or hypothesis testing." - "Confidence intervals are often more informative than tests." Because they also give magnitude and area easier to interpret. - "The p-value is not the probability that the null hypothesis is true." - "To combine prior beliefs with data in a principled way, use Bayesian inference. To construct procedures with guaranteed long run performance, such as confident intervals, use frequentist methods." - Choosing among ways to generate estimators is decision theory. "Decision theory which is the formal theory for comparing statistical procedures.... and estimator is sometimes called a decision rule." Meausured using a loss function such as squared error loss, absolute loss, etc. Usually we use squared error lose (e.g. mean squared error). - "[AIC: Akaike Information Criterion] can be thought of 'goodness of fit' minus 'complexity.'" - The BIC (Bayesian Information Criterion) is similar but puts a more severe penalty for complexity. - "In forward stepwise regression, we start with no covariates in the model. We then add the one variable that leads to the best score we continue adding variables one at a time until the score does not improve. Backwards stepwise regression is the same except we start with the biggest model and drop one variable at a time. Both are greedy searches: neither is guaranteed to find the model with the best score." [Note: page 221 typo: nether vs neither]. "Another popular method is to do random searching through the set of all models. However, there is no reason to expect this to be superior to a deterministic search." For example we might use stepwise regression using AIC as our score. - "roughly speaking, the statement 'X causes Y' means that changing the value of X will change the distribution of Y. When X causes Y, X and Y will be associated but the reverse is not, in general, true. Association does not necessarily imply causation." - "Even after adjusting for confounders, we cannot be sure that there are not other confounding variables that we missed. This is why observational studies must be treated with healthy skepticism. Results from observational studies start to become believable when: (i) the results are replicated in many studies, (ii) each of these studies controlled for plausible confounding variables, (iii) there is a plausible scientific explanation for the existence of a causal relationship... a good example is smoking and cancer." - X (parent) -> Y(child). Where X causes Y X-> Y <- Z, is called a collider at Y. - The curse of dimensionality. "To get a sense of how serious this problem is, consider the following table from Silverman (1986) which shows the sample size required to ensure a relative mean squared error less than 0.1 at 0 when the density is multicariate normal and the optimal bandwidth is selected.... that is bad news indeed. It says that having 824,000 observations in a ten-dimensional problem is really like having 4 observations in a one-dimensional problem." - Statistics -> computater science terms: Classification -> supervised learning. Predicting a discrete Y from X. Data -> training sample Covariates -> features Classifier -> hypothesis. Map X -> Y Estimation -> learning. Finding a good classifier
It was my first statistics book and I disliked the book since the author does a poor job explaining the details. If you are new to statistics without a lot of training in mathematics ANY other book would be better than this book.
So far the best statistics book i have ever read. Broad range of subjects, enough examples to give some idea about the concepts. It requires a bit of calculus knowledge though.
A great book, nice to keep as a reference and occasional refresher. Covers an impressive range of topics. Well written and holds up remarkably well despite its age.
10/15/2015: So far, this is a really good book with comprehensive material, simple examples, rich problems, and most importantly easy to understand.
12/8/2015: I like everything about this book, except the title. It may receive some complaints about not discussing in depth some topics, but one can always go look up and read more on their topics of interest. Nonetheless, this is a very well written book!
The author states that he wrote the book to help get engineering students up to speed. The topics and depth are in line with what one would expect from a mathematical statistics book. It's a good book for finding out what is out there, but most discussions are too brief for most people to learn the material from this book.
The material covered in this book is not covered in sufficient depth to understand it unless you have covered once already. That said this book is a great reference: collections of useful theorems and properties.
Great exposition of probability and statistics from an abstract, theorem oriented point of view, similar to Linear Algebra Done Right. Most haters are probably not comfortable with proofs style math.