Rate this book

R for Data Science: Import, Tidy, Transform, Visualize, and Model Data

Hadley Wickham

Rate this book

Learn how to use R to turn raw data into insight, knowledge, and understanding. This book introduces you to R, RStudio, and the tidyverse, a collection of R packages designed to work together to make data science fast, fluent, and fun. Suitable for readers with no previous programming experience, R for Data Science is designed to get you doing data science as quickly as possible.

Authors Hadley Wickham and Garrett Grolemund guide you through the steps of importing, wrangling, exploring, and modeling your data and communicating the results. You'll get a complete, big-picture understanding of the data science cycle, along with basic tools you need to manage the details. Each section of the book is paired with exercises to help you practice what you've learned along the way.

You'll learn how

Wrangle—transform your datasets into a form convenient for analysisProgram—learn powerful R tools for solving data problems with greater clarity and easeExplore—examine your data, generate hypotheses, and quickly test themModel—provide a low-dimensional summary that captures true "signals" in your datasetCommunicate—learn R Markdown for integrating prose, code, and results

GenresProgrammingComputer ScienceNonfictionScienceReferenceTechnicalTextbooks

521 pages, Kindle Edition

First published December 12, 2016

866 people are currently reading

2216 people want to read

About the author

Hadley Wickham

19 books182 followers

What do you think?

Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars

772 (65%)

4 stars

308 (26%)

3 stars

77 (6%)

2 stars

8 (<1%)

1 star

14 (1%)

Displaying 1 - 29 of 110 reviews

Scott Davidson

2 reviews5 followers

January 30, 2017

What a great book

I am an architect that got into studying data analysis as kind of a weird mid-life crisis. After some Coursera classes and a few books, I am really starting to finally understand R. But, this books and the Tidyverse set of packages is a game changer. So much more clear and intuitive. I highly recommend this book! Buy it.

Geoff

994 reviews122 followers

August 18, 2018

Nothing has made me love coding in R more than the tidyverse. A masterpiece of technical educational writing.

work

☘Misericordia☘ ⚡ϟ⚡⛈⚡☁ ❇️❤❣

2,519 reviews19.2k followers

January 21, 2020

Free version published here: https://r4ds.had.co.nz/

Joshua Hruzik

17 reviews6 followers

April 22, 2017

The new bible for R
Hadley Wickham transformed how we use R and accelerated its capabilities by a large margin. His work has been condensed into a single package called "tidyverse" which introduces tools that range from data transformation to data presentation. While his new book R for Data Science is written for beginners, even experienced users will find many resources that will make them better at R. This is because (clumsy) base packages are rarely used.

If you use R, you must have read this book, no matter how experienced you are!

Sofia

431 reviews2 followers

Read

December 20, 2022

For work :(

Terran M

78 reviews103 followers

March 22, 2018

This book is an excellent gentle introduction to data analysis and exploration in R. I especially recommend it as the 1st book for software engineers who want to move into data science.

Because the "tidyverse" libraries are very "magical", making extensive use of nonstandard syntactical features like unquoted column names, this book does not teach good general programming practices. Therefore I do not recommend it if you are unfamiliar with programming and want to learn.

I recommend skipping Part IV entirely, as I feel the attempt at introducing regression in a non-mathematical way is largely a failure. I like the book in spite of this shortcoming.

This book is available for free from the author at http://r4ds.had.co.nz

Adam Zabell

110 reviews16 followers

February 24, 2018

I'm still not convinced I want to switch over from being an R-core person to a tidyR person. But, for the sake of the shortcourse I helped build, I "loved it for thirty minutes." Doing that reminded me how much I love pipes, and I learned how amazingly easy it is to perform graph faceting is with ggplot.

If you're just starting out with R, start here. It's going to give you the tools you need to do the work you want, and it'll train your brain to think in a way that aligns your data with your goals. If you've been using R at an intermediate-to-advanced level already, I'm not as convinced. You have a way that works for you, and you're getting what you need. Ultimately, R is just a tool to get what you need, and whether you build that with pipes, or a function, or a whole bunch of nested for-loops, you understand your code and that's just fine.

Probably the biggest indicator that I'm going to be a grudging convert? The work I'm doing right now feels constrained by the fact that I'm annotating with long comments instead of committing to Rmarkdown. If you ask me in a year, I'll let you know where I land.

business reference science

Teo

850 reviews88 followers

April 4, 2020

2018.08.21–2019.09.02

Contents (on 2018-08-21; might be updated online)

Wickham H & Grolemund G (2016) R for Data Science - Import, Tidy, Transform, Visualize, and Model Data

Welcome

1. Introduction
1.1 What you will learn
1.2 How this book is organised
1.3 What you won’t learn
1.3.1 Big data
1.3.2 Python, Julia, and friends
1.3.3 Non-rectangular data
1.3.4 Hypothesis confirmation
1.4 Prerequisites
1.4.1 R
1.4.2 RStudio
1.4.3 The tidyverse
1.4.4 Other packages
1.5 Running R code
1.6 Getting help and learning more
1.7 Acknowledgements
1.8 Colophon

Part I: Explore

2. Introduction

3. Data visualisation
3.1 Introduction
3.1.1 Prerequisites
3.2 First steps
3.2.1 The mpg data frame
3.2.2 Creating a ggplot
3.2.3 A graphing template
3.2.4 Exercises
3.3 Aesthetic mappings
3.3.1 Exercises
3.4 Common problems
3.5 Facets
3.5.1 Exercises
3.6 Geometric objects
3.6.1 Exercises
3.7 Statistical transformations
3.7.1 Exercises
3.8 Position adjustments
3.8.1 Exercises
3.9 Coordinate systems
3.9.1 Exercises
3.1 The layered grammar of graphics

4. Workflow: basics
4.1 Coding basics
4.2 What’s in a name?
4.3 Calling functions
4.4 Practice

5. Data transformation
5.1 Introduction
5.1.1 Prerequisites
5.1.2 nycflights13
5.1.3 dplyr basics
5.2 Filter rows with filter()
5.2.1 Comparisons
5.2.2 Logical operators
5.2.3 Missing values
5.2.4 Exercises
5.3 Arrange rows with arrange()
5.3.1 Exercises
5.4 Select columns with select()
5.4.1 Exercises
5.5 Add new variables with mutate()
5.5.1 Useful creation functions
5.5.2 Exercises
5.6 Grouped summaries with summarise()
5.6.1 Combining multiple operations with the pipe
5.6.2 Missing values
5.6.3 Counts
5.6.4 Useful summary functions
5.6.5 Grouping by multiple variables
5.6.6 Ungrouping
5.6.7 Exercises
5.7 Grouped mutates (and filters)
5.7.1 Exercises

6. Workflow: scripts
6.1 Running code
6.2 RStudio diagnostics
6.3 Practice

7. Exploratory Data Analysis
7.1 Introduction
7.1.1 Prerequisites
7.2 Questions
7.3 Variation
7.3.1 Visualising distributions
7.3.2 Typical values
7.3.3 Unusual values
7.3.4 Exercises
7.4 Missing values
7.4.1 Exercises
7.5 Covariation
7.5.1 A categorical and continuous variable
7.5.1.1 Exercises
7.5.2 Two categorical variables
7.5.2.1 Exercises
7.5.3 Two continuous variables
7.5.3.1 Exercises
7.6 Patterns and models
7.7 ggplot2 calls
7.8 Learning more

8. Workflow: projects
8.1 What is real?
8.2 Where does your analysis live?
8.3 Paths and directories
8.4 RStudio projects
8.5 Summary

Part II: Wrangle

9. Introduction

10. Tibbles
10.1 Introduction
10.1.1 Prerequisites
10.2 Creating tibbles
10.3 Tibbles vs data.frame
10.3.1 Printing
10.3.2 Subsetting
10.4 Interacting with older code
10.5 Exercises

11. Data import
11.1 Introduction
11.1.1 Prerequisites
11.2 Getting started
11.2.1 Compared to base R
11.2.2 Exercises
11.3 Parsing a vector
11.3.1 Numbers
11.3.2 Strings
11.3.3 Factors
11.3.4 Dates, date-times, and times
11.3.5 Exercises
11.4 Parsing a file
11.4.1 Strategy
11.4.2 Problems
11.4.3 Other strategies
11.5 Writing to a file
11.6 Other types of data

12. Tidy data
12.1 Introduction
12.1.1 Prerequisites
12.2 Tidy data
12.2.1 Exercises
12.3 Spreading and gathering
12.3.1 Gathering
12.3.2 Spreading
12.3.3 Exercises
12.4 Separating and uniting
12.4.1 Separate
12.4.2 Unite
12.4.3 Exercises
12.5 Missing values
12.5.1 Exercises
12.6 Case Study
12.6.1 Exercises
12.7 Non-tidy data

13. Relational data
13.1 Introduction
13.1.1 Prerequisites
13.2 nycflights13
13.2.1 Exercises
13.3 Keys
13.3.1 Exercises
13.4 Mutating joins
13.4.1 Understanding joins
13.4.2 Inner join
13.4.3 Outer joins
13.4.4 Duplicate keys
13.4.5 Defining the key columns
13.4.6 Exercises
13.4.7 Other implementations
13.5 Filtering joins
13.5.1 Exercises
13.6 Join problems
13.7 Set operations

14. Strings
14.1 Introduction
14.1.1 Prerequisites
14.2 String basics
14.2.1 String length
14.2.2 Combining strings
14.2.3 Subsetting strings
14.2.4 Locales
14.2.5 Exercises
14.3 Matching patterns with regular expressions
14.3.1 Basic matches
14.3.1.1 Exercises
14.3.2 Anchors
14.3.2.1 Exercises
14.3.3 Character classes and alternatives
14.3.3.1 Exercises
14.3.4 Repetition
14.3.4.1 Exercises
14.3.5 Grouping and backreferences
14.3.5.1 Exercises
14.4 Tools
14.4.1 Detect matches
14.4.2 Exercises
14.4.3 Extract matches
14.4.3.1 Exercises
14.4.4 Grouped matches
14.4.4.1 Exercises
14.4.5 Replacing matches
14.4.5.1 Exercises
14.4.6 Splitting
14.4.6.1 Exercises
14.4.7 Find matches
14.5 Other types of pattern
14.5.1 Exercises
14.6 Other uses of regular expressions
14.7 stringi
14.7.1 Exercises

15. Factors
15.1 Introduction
15.1.1 Prerequisites
15.1.2 Learning more
15.2 Creating factors
15.3 General Social Survey
15.3.1 Exercise
15.4 Modifying factor order
15.4.1 Exercises
15.5 Modifying factor levels
15.5.1 Exercises

16. Dates and times
16.1 Introduction
16.1.1 Prerequisites
16.2 Creating date/times
16.2.1 From strings
16.2.2 From individual components
16.2.3 From other types
16.2.4 Exercises
16.3 Date-time components
16.3.1 Getting components
16.3.2 Rounding
16.3.3 Setting components
16.3.4 Exercises
16.4 Time spans
16.4.1 Durations
16.4.2 Periods
16.4.3 Intervals
16.4.4 Summary
16.4.5 Exercises
16.5 Time zones

Part III: Program

17. Introduction
17.1 Learning more

18. Pipes
18.1 Introduction
18.1.1 Prerequisites
18.2 Piping alternatives
18.2.1 Intermediate steps
18.2.2 Overwrite the original
18.2.3 Function composition
18.2.4 Use the pipe
18.3 When not to use the pipe
18.4 Other tools from magrittr

19. Functions
19.1 Introduction
19.1.1 Prerequisites
19.2 When should you write a function?
19.2.1 Practice
19.3 Functions are for humans and computers
19.3.1 Exercises
19.4 Conditional execution
19.4.1 Conditions
19.4.2 Multiple conditions
19.4.3 Code style
19.4.4 Exercises
19.5 Function arguments
19.5.1 Choosing names
19.5.2 Checking values
19.5.3 Dot-dot-dot (…)
19.5.4 Lazy evaluation
19.5.5 Exercises
19.6 Return values
19.6.1 Explicit return statements
19.6.2 Writing pipeable functions
19.7 Environment

20. Vectors
20.1 Introduction
20.1.1 Prerequisites
20.2 Vector basics
20.3 Important types of atomic vector
20.3.1 Logical
20.3.2 Numeric
20.3.3 Character
20.3.4 Missing values
20.3.5 Exercises
20.4 Using atomic vectors
20.4.1 Coercion
20.4.2 Test functions
20.4.3 Scalars and recycling rules
20.4.4 Naming vectors
20.4.5 Subsetting
20.4.6 Exercises
20.5 Recursive vectors (lists)
20.5.1 Visualising lists
20.5.2 Subsetting
20.5.3 Lists of condiments
20.5.4 Exercises
20.6 Attributes
20.7 Augmented vectors
20.7.1 Factors
20.7.2 Dates and date-times
20.7.3 Tibbles
20.7.4 Exercises

21. Iteration
21.1 Introduction
21.1.1 Prerequisites
21.2 For loops
21.2.1 Exercises
21.3 For loop variations
21.3.1 Modifying an existing object
21.3.2 Looping patterns
21.3.3 Unknown output length
21.3.4 Unknown sequence length
21.3.5 Exercises
21.4 For loops vs functionals
21.4.1 Exercises
21.5 The map functions
21.5.1 Shortcuts
21.5.2 Base R
21.5.3 Exercises
21.6 Dealing with failure
21.7 Mapping over multiple arguments
21.7.1 Invoking different functions
21.8 Walk
21.9 Other patterns of for loops
21.9.1 Predicate functions
21.9.2 Reduce and accumulate
21.9.3 Exercises

Part IV: Model

22. Introduction
22.1 Hypothesis generation vs hypothesis confirmation

23. Model basics
23.1 Introduction
23.1.1 Prerequisites
23.2 A simple model
23.2.1 Exercises
23.3 Visualising models
23.3.1 Predictions
23.3.2 Residuals
23.3.3 Exercises
23.4 Formulas and model families
23.4.1 Categorical variables
23.4.2 Interactions (continuous and categorical)
23.4.3 Interactions (two continuous)
23.4.4 Transformations
23.4.5 Exercises
23.5 Missing values
23.6 Other model families

24. Model building
24.1 Introduction
24.1.1 Prerequisites
24.2 Why are low quality diamonds expensive?
24.2.1 Price and carat
24.2.2 A more complicated model
24.2.3 Exercises
24.3 What affects the number of flights?
24.3.1 Day of week
24.3.2 Seasonal Saturday effect
24.3.3 Computed variables
24.3.4 Time of year: an alternative approach
24.3.5 Exercises
24.4 Learning more about models

25. Many models
25.1 Introduction
25.1.1 Prerequisites
25.2 gapminder
25.2.1 Nested data
25.2.2 List-columns
25.2.3 Unnesting
25.2.4 Model quality
25.2.5 Exercises
25.3 List-columns
25.4 Creating list-columns
25.4.1 With nesting
25.4.2 From vectorised functions
25.4.3 From multivalued summaries
25.4.4 From a named list
25.4.5 Exercises
25.5 Simplifying list-columns
25.5.1 List to vector
25.5.2 Unnesting
25.5.3 Exercises
25.6 Making tidy data with broom

Part V: Communicate

26. Introduction

27. R Markdown
27.1 Introduction
27.1.1 Prerequisites
27.2 R Markdown basics
27.2.1 Exercises
27.3 Text formatting with Markdown
27.3.1 Exercises
27.4 Code chunks
27.4.1 Chunk name
27.4.2 Chunk options
27.4.3 Table
27.4.4 Caching
27.4.5 Global options
27.4.6 Inline code
27.4.7 Exercises
27.5 Troubleshooting
27.6 YAML header
27.6.1 Parameters
27.6.2 Bibliographies and Citations
27.7 Learning more

28. Graphics for communication
28.1 Introduction
28.1.1 Prerequisites
28.2 Label
28.2.1 Exercises
28.3 Annotations
28.3.1 Exercises
28.4 Scales
28.4.1 Axis ticks and legend keys
28.4.2 Legend layout
28.4.3 Replacing a scale
28.4.4 Exercises
28.5 Zooming
28.6 Themes
28.7 Saving your plots
28.7.1 Figure sizing
28.7.2 Other important options
28.8 Learning more

29. R Markdown formats
29.1 Introduction
29.2 Output options
29.3 Documents
29.4 Notebooks
29.5 Presentations
29.6 Dashboards
29.7 Interactivity
29.7.1 htmlwidgets
29.7.2 Shiny
29.8 Websites
29.9 Other formats
29.10 Learning more

30. R Markdown workflow

_contents _nonfiction automation

Josh Lebowitz

9 reviews1 follower

August 9, 2021

This book provides an amazing introduction to coding in r. Would highly recommend to anyone looking to learn the programming language.

Leonardo

Author 1 book77 followers

read-in-part

September 8, 2020

Lo usamos en el curso de CEPAL

ai mecon

Risa

85 reviews

July 21, 2021

It's unreal to me how helpful this book has already been for my coding - it made me want to be in the data analysis/graph creation part of my Master's program.

I had no idea the Tidyverse could do most of the things described in this book! Additionally, the writing was straightforward, clear, and even funny at times. However, I would have been a bit lost if I wasn't a regular R user already (I'd not call this an intro book); many of the R basics are covered out of order - which is intuitive and great since I've used the software lots, but may have had me foundering if that wasn't the case.

Mikhael Hayes

91 reviews

April 30, 2025

Okay, my professor didn't require we read every single chapter, but I spent enough time in it that I feel no shame marking it read. To even things out, I'll just not count 'The Art of R Programming' even though i'm hundreds of pages in

Rahul

31 reviews1 follower

June 28, 2018

Excellent comprehensive reference to the tidyverse set of R packages. A couple of mistakes are present in the print version, but have been corrected in the publicly accessible online version. Good to read this cover-to-cover, but can also use as a reference when necessary.

data-science-machine-learning statistics

Philipp

679 reviews218 followers

August 11, 2017

This is my new go-to book when someone asks me for an introduction to R.

It teaches 'modern' R in form of the tidyverse (sometimes Hadleyverse), the set of packages the author has written that essentially replace most of the basic functionality of R. The tidyverse is logically coherent, someone actually sat down and thought things through - basic R grew over decades and it shows, many hands don't make a coherent structure.

The book follows a logical 'story', starting with 1) data exploration, importing, and visualization to find patterns in data, followed by 2) data wrangling, i.e., subsetting and munging and tidying, followed by 3) actual programming - pipes, functions, iterating, then comes 4) modeling, how to build models and a few common models used in R, and it ends with 5), communicating, which is about various ways of writing reproducible reports and how to get publication-ready plots.

Maybe you can get an idea of the contents by the books Wickham recommends:
- The Truthful Art: Data, Charts, and Maps for Communication
- Presentation Patterns: Techniques for Crafting Better Presentations
- The Non-Designer's Design Book
- Style: Lessons in Clarity and Grace
- The Sense of Structure: Writing from the Reader's Perspective
- Statistical Modeling: A Fresh Approach

I have a few minor nitpicks:
- some sections use the cars and diamonds datasets, I can't stand those datasets anymore. Then again iris never shows up....
- sometimes the author goes a bit overboard and the examples get too complex, I wouldn't be able to make these plots if you'd told me that I just read how to make them.

Anyway, these are minor, to me this book is the best way if you want to see what modern R has to offer.

non-fiction programming

Sam Peterson

176 reviews8 followers

September 27, 2024

This book rocks my world fr

Andreas Aristidou

55 reviews2 followers

May 30, 2020

Whatever R-related product, whether that is a package or a book / tutorial, that comes from Hadley Wickham is gold. After reading his ggplot2 book (and being amazed by the clarity and teaching prowess of the author) I dover into this one... Read the whole book in two weeks.

In many ways it is similar to the ggplot2 book. It's an introductory version but at the same time one that covers a wide range of topics and does go somewhat deep. More importantly, it's a book that (like the ggplot2 book) will serve as a guideline for the future. It's almost impossible to learn everything in this book in one go - you'll most likely need years to do so.. and that's ok. The book is organized so well that you can easily reference back whenever you need some concept. That's how I intended on using it. I went through it without doing the exercises, making notes throughout the book so that I get a sense of what's there and where to find things I might need when coding. I strongly believe it will be counterproductive trying to learn everything before you start coding.

The topics covered range from importing data, to data wrangling, graphing, modeling, and R markdown. Pretty much all you need to get started as a data scientist. Excellent. Could not recommend enough!

Matija

93 reviews24 followers

June 21, 2017

Finally, a good book on R that doesn't presume you are a stats buff. This book actually doesn't presume anything about you, and is written very clearly and with the right amount of to-the-point examples to get you going. I don't know how much my previous familiarity with R based on various other books and resources helped me get into the groove with this one, but R for Data Science by Hadley Wickham and Garrett Grolemund definitely taught me something I didn't know before in the least amount of time since I started reading it. The problem I now have is that Tidyverse introduces a different (and better) way of working with core R constructs, which also happens to be somewhat incompatible (or at least poorly fitting) with a number of decades of existing online R lore. Still, I'm happy I now have a clear guide to a better way of doing things in one book. I would recommend this as the first book on R for anyone who wants to get on board.

An Te

386 reviews26 followers

May 10, 2022

A splendid book on the language! The exercises are fun and engaging and the coding wisdom from the author is ever so helpful to navigate and troubleshoot. I think this is the one place to start learning R to be frank that covers all that it can do. Naturally other books elaborate on this further but there is no doubt how well Hadley and all the contributors to the book have worked to produce a streamlined and comprehensive book 📕. Well done

There are some difficult sections on listing models and nesting and in nesting them but the author forewarned that this may need to be returned to at a later date. So certainly this book will not be a one time read.

It is truly a book demonstrating well the power of good pedagogy on a technical matter. Seriously good stuff.

Solutions are also available free online if you search for them.

Chip

19 reviews

January 31, 2017

I think Hadley's contributions to `R` have been amazing, even staggering. It's good to see that he's taken all of the ideas of the `tidyverse` world and put them into a written format.

If you've used `R` in the past but mainly use base functions then this will be a great refresher for you. If you're new to the world of `R` then this book will give you a solid foundation of how to get started.

April

2 reviews

October 10, 2017

This book is like a filtered down set of help files. There are a few sentences that are actually helpful, but the rest of the book is terminology, questions without answers. It almost feels like a mad scientist shared his notebook with us. Not very helpful. Maybe I just needed a decoder ring.

Jakob

21 reviews1 follower

August 6, 2023

"R für Data Science" von Hadley Wickham ist zweifellos ein geeignetes Buch für jeden, der sich mit Datenwissenschaft und Datenanalyse beschäftigt. In diesem umfassenden Werk gelingt es dem Autor, komplexe Konzepte und Methoden der Datenanalyse in R auf eine klare und praxisorientierte Weise zu vermitteln. Das Buch eignet sich besonders für Leser, welche bereits Vorkenntnisse mir R oder im Bereich der Datenanalyse besitzen.
Besonders gut gefiel mir das Kapitel über Modelle, in welchem Hadley Wickham darauf eingeht, wie man praxisnahe Modelle erstellt. In der Welt der Datenwissenschaft geht es darum, Modelle zu entwickeln, die nicht nur theoretisch korrekt sind, sondern auch in der realen Welt anwendbar und interpretierbar sind. Dieses Buch leistet einen hervorragenden Beitrag dazu, indem es den Lesern zeigt, wie sie Modelle erstellen, die echte Geschäftsprobleme lösen und nützliche Einblicke liefern.
Das Kapitel zu Modellen beginnt mit einer fundierten Einführung in die theoretischen Grundlagen. Wickham erläutert die verschiedenen Arten von Modellen, von einfachen linearen Modellen bis hin zu komplexen maschinellen Lernmodellen. Dabei bleibt er stets verständlich und vermeidet unnötigen Fachjargon, was besonders für Leser mit wenig Erfahrung im Bereich der Modelle von Vorteil ist.
Ein Höhepunkt dieses Buches ist die Betonung dessen, was ein gutes Modell ausmacht. Wickham stellt nicht nur Methoden vor, wie man Modelle erstellt, sondern erläutert auch die wichtigen Aspekte, die ein Modell zu einem nützlichen Werkzeug in der Datenanalyse machen. Er geht auf die Validierung von Modellen, das Überprüfen der Modellgenauigkeit und die Interpretation der Modellergebnisse ein. Diese praktische Herangehensweise ist äußerst wertvoll, da sie dem Leser dabei hilft, Modelle nicht nur zu erstellen, sondern auch kritisch zu bewerten und die Ergebnisse sinnvoll zu nutzen.
Ein weiterer Pluspunkt dieses Buches ist die gute Strukturierung. Dieses ist so strukturiert, dass jedes Kapitel von einem wichtigen Schritt in der Daten-Analyse handelt. So beginnt es mit dem Aufbereiten von Daten, über das Programmieren, Modellieren und zuletzt Kommunizieren.
Das Buch enthält auch viele praxisnahe Beispiele und Übungen, die den Lernprozess unterstützen und das Verständnis vertiefen. Die Beispiele basieren auf realen Datensätzen und realen Problemstellungen, was den Lesern hilft, das Gelernte in echten Szenarien anzuwenden.
Zusammenfassend ist "R für Data Science" von Hadley Wickham ein ausgezeichnetes Buch, das einen tiefen Einblick in die Welt der Datenanalyse und Modelle in R bietet. Insbesondere das Kapitel zu Modellen glänzt mit praxisnahen Ansätzen, um gute Modelle zu erstellen und zu bewerten. Wickham's Schreibstil ist klar und verständlich, was das Lernen und die Umsetzung der Konzepte erleichtert. Dieses Buch ist ein Muss für jeden, der seine Datenanalysefähigkeiten in R auf ein höheres Niveau bringen möchte oder auch Forschung betreibt und seine Ergebnisse und Statistiken aufbereiten möchte.

Reinhardt

234 reviews2 followers

June 15, 2020

Superb!

This books is one of the best learn-to-code books I’ve come across. The organization is well thought out and pedagogically sound. Many books on code are haphazardly assembled by someone who knows the subject matter but doesn’t consider how best to teach it. Hadley and crew clearly know how to teach.

And they know how to write. The authors are the Hemingways of tech. The language is so silky smooth you never stumble over words. Rereading a sentence is simply never required. Stunning considering the technical nature of the content. The paragraphs are structured in such a way that you progress from the known to unknown. The style is light and conversational but never eccentric. They manage to keep the prose as light as a feather without any fluff. No wasted words here. Pure joy to read.

The only time it felt like I might be in deep water is the first chapter on modeling. You can sense their excitement with the subject and they are off and running, pointing to great adventure like a kid in Disneyland while you’re struggling to keep up. The second chapter on modeling you get your sea legs and the sensation of drowning fades.

The book doesn’t delve into statistical concept, but rather focuses on how to process, analyze and communicate data with the R programming language. Something R is specifically designed to do.

If you’re thinking about learning a programming language to help with data analysis, stop, do not pass go, get this is the book. It includes a great set of exercises that prompt you to explore on your own.

In short, get this book.

Carlos

65 reviews

October 24, 2023

This book is solid. It helped me develop a professional level data science project using only the R programming language and its libraries.

Hadley Wickham, a prominent figure in the R community, offers a structured and practical approach to data science with R. The book covers the entire data analysis workflow, from importing and tidying data to visualizing and modeling it. It's not just a guide to R; it's a guide to the entire process of working with data effectively.

The book introduces you to the principles of tidy data, which is an organized and consistent way of structuring your data for analysis. You'll learn how to manipulate and transform data, create data visualizations using ggplot2, and build models to gain insights from your data.

What makes this book particularly valuable is its real-world applicability. Wickham provides practical examples and exercises that allow you to apply what you've learned to real data analysis tasks. The book doesn't just teach you R syntax; it teaches you how to think like a data scientist.

By the time you've worked your way through "R for Data Science," you'll be equipped with the knowledge and skills to tackle data analysis projects, from data cleaning and visualization to modeling and interpretation. Whether you're in academia, industry, or any field that deals with data, this book is an invaluable resource for mastering R and becoming a proficient data scientist.

Ender Yolagel

4 reviews6 followers

April 7, 2020

The holy book of core data science with R by Wickham and Grolemund. No matter how experienced you are, you will find many "tiny" things to learn from this book.

The book utilizes the "tidyverse" collection of packages with a coherent philosophy that sits behind them (tidyr for reshaping, dplyr for transformation, ggplot2 for visualisation, broom for linear models etc.) that dramatically speed up most of the common steps involved in an analysis.

Although I was familiar with those packages, the book taught me purrr (for functional programming) and how to better use the packages together. This improved my functional programming skills tremendously.

It also teaches you how to use the grammar of graphics, literate programming, and reproducible research to save time. Grolemund's Hands-On Programming with R is also a good companion for getting the most out of this book.

The modelling part is a little rudimentary. Most of the examples are just fitting independent regression models, whereas it seems to me that a hierarchical model would be a better fit. Still these are small things and it would be silly to expect a single book to cover all of these areas.

data-ml-technical

Ben

58 reviews2 followers

February 28, 2022

I read this book to get reacquainted with R after having not used it for a while. This book serves as a great refresher, particularly if you’re interested in the “tidyverse” way of doing things. The book does lack introductions to foundational principles in R, and I’m not sure I would recommend it as a first introduction. There’s some interesting but brief coverage of plotting with ggplot2 and visualizing basic model outputs, but for more coverage, you will need to look elsewhere (the book often includes good references).

The bulk of useful material is around munging data efficiently and in a way that is easy to understand (filtering, grouping, transforming, using various string and date manipulations). In my opinion, this is the most valuable change that the tidyverse suite brings—it’s much easier to write and read transformations written in this way than with base R indexing, etc. The libraries also provide a more opinionated and robust way to deal with messy data than the base R equivalents.

Karen

225 reviews12 followers

October 2, 2020

4.4

Wow, that was a very long 3 months of reading.

I have been wanting to get into research for a while, and data science seemed? like an entry point? Honestly, I don't really know what I'm doing.

Anyway, when I contacted a lab I was interested in, they took one look at my transcript and said, "Do you have any of the skills needed for data science?" And I was like, well, no. So they gave me this book, and I've been reading it ever since. (Putting a moderate time sink into learning skills has been a very productive theme of past summers.)

I read the free online version of the book, supplemented with Arnold's amazing documentation of exercise solutions. Some bits were rough, because it's definitely hard to predict readers' prior knowledge, and some of the exercises could be improved/modernized, but overall I thought this was an extremely accessible and well-written book, relatively speaking. I highly recommend the online format, because the lack of page numbers makes time go by faster, and it doesn't feel like you're reading a "real textbook" at all. I feel so much more confident now than I did at the beginning of the summer as a life sciences major with about one college statistics course's knowledge of R.

Also, on reflection a month later, I learned nothing about actual statistical techniques or modeling from this book. It is a fantastic resource for learning a certain approach to a very diverse language, but it will not replace your good old college statistics courses.

Disclaimer: I have a moderate amount of background experience in general programming, which really helped throughout. (For example, the authors do not bother to explain what anonymous functions really are, although they spend a significant amount of time on for loops.) I don't know if this is the right resource to start coding entirely, unless you're solely planning on working in R.

Vadim Yapiyev

25 reviews3 followers

December 13, 2020

Well, I specifically bought the paper book on Amazon to get it through. I cannot say that I enjoyed it but it was usefull to improve my skills in R for data analysis. I used it together with courses on Datacamp. The book has a interactive web site made with Rmarkdown, the book itself is written with Rbookdown which I briefly discussed in the end. The book is in tidyverse way. So far it was the only useful book on R I could find for begginers. I was kind of intemediate in R already. I wanted to get more systematic in R skills. I still cannot write my own model, but I will be soon. Overall good guide to R using R-studio, while most of R is not learn reading the books, but by writing a code. I copied code online from https://r4ds.had.co.nz/index.html where most of the text and code chunks are given.

Siobhan

4,928 reviews592 followers

September 30, 2017

Hadley Wickham’s has created the guide to R. If you’re new to R, this book is a wonderful introduction. If you’re a regular R user, this book is great for those little details you wish to double check. If you’re returning to R after some time away, this book is a brilliant refresher course.

The one thing I will say, for those who are new to R, is that some elements of this can be a bit daunting when you first dive in. With R, it takes a while before you’re confident in your knowledge, and some of the things in this book require you to have a bit of faith in your capabilities.

A very useful read, no matter your R knowledge.

Gallottino

68 reviews

January 7, 2018

Un grande libro utilissimo per chi vuole avvicinarsi ed approfondire il software R ed i molti packages dedicati alla produzione di grafici ed analisi. Hadley Wickham è un genio.

A must have book for anyone who wants to understand the basic approach to R and the packages included. Hadley Wickham is a a genius. Surely there are some paragraphs not so clear but this depends also by what specific application you are involved in your reasearch field.
With this book and a lot of exercises you can have a powerful tool for your analysis.

lingua-inglese

Displaying 1 - 29 of 110 reviews

More reviews and ratings