Our first SPOTLIGHT article! We had the pleasure of running 10 questions by Hadley Wickham to get a better understanding of what makes the most famous name in the world of R, tick!
For those who know his work, you'll no doubt appreciate how much easier your life is due to his amazingly useful and logical R packages.
For those who don't know the name Hadley Wickham yet, here's a brief intro:
Hadley is Chief Scientist at RStudio, and an Adjunct Professor of Statistics at the University of Auckland, Stanford University, and Rice University. In his own words, "I build tools (computational and cognitive) that make data science easier, faster, and more fun".
He is most commonly known for his work creating packages in R. On his website, he breaks these down into three key categories:
Data Science:
ggplot2 for visualising data
dplyr for manipulating data
tidyr for tidying data
stringr for working with strings
lubridate for working with date/times
Data Import:
readr for reading .csv and fwf files
readxl for reading .xls and .xlsx files
haven for SAS, SPSS, and Stata files
httr for talking to web APIs
rvest for scraping websites
xml2 for importing XML files
Software Engineering:
devtools for general package development
roxygen2 for in-line documentation
testthat for unit testing
On top of this work, he has authored multiple Data Science books (find out more info on his site, here) and there even sites dedicated to his scope of work which has become widely known as the "Hadleyverse"
Hopefully, that's enough of an intro! Let's dive into the interview...
1. Hadley, you're very invested in R these days having spent many years working with it, contributing packages, and authoring books - but when did you first encounter R and what made it so engaging to you?
I first encountered R in one of my statistics courses during my BSc at the University of Auckland. I still strongly recall my initial impression of R. I had a programming background and R did things that blew my mind. One of the most mind expanding experiences was running plot(x, sin(x)) and seeing x and sin(x) on the axes. If you've only used R you take this sort of behaviour for granted, but it was unlike anything I had ever seen before. The challenge of understanding what was going put my feet on the road to the depths of R.
2. In this industry it can sometimes be hard to put into words, what it is that we do exactly. When you're at a BBQ (a quick look at your Instagram suggests you're a fan...) and someone asks "So Hadley, what do you do?" - how do you respond?
I don't have a great answer for this. If the person is somewhat technically savvy, I'll say I develop open source software for data scientists (and if I'm in NZ, I'll also mention that I do this in R). Otherwise, I say I do statistics and programming. That's usually enough to kill the conversation :P
3. What does a typical day of work look like for you right now - what are you working on?
I'm currently finishing up the dplyr 0.6.0 release which I've been spending a lot of time on lately. dplyr is a popular package so there's a huge amount of work that goes into ensuring that we haven't introduced new bugs, and making sure that if we've deliberately change how a function works that everyone knows about it and how to update their code. Now that I have a team at RStudio (Gabor, Jenny, Jim, Lionel, and Max) more of my week is taken up with management. Fortunately they're all awesome so it's mostly making sure that they're not blocked and helping to set priorities across the whole group. However, I still love indepedent programming and writing and want to do plenty of it - I now avoid meetings on Tuesdays and Thursdays so I have enough time to deeply immerse myself in technical challenges.
4. Your latest book R For Data Science was published earlier this year, what does it cover and who might benefit from grabbing a copy?
R4DS is my attempt to explain the key tools that allow you to do data science in R. It describes the tidyverse, a collection of packages designed to work well together, and make data analysis more efficient, more expressive, and more fun. The book should be accessible to the newcomer to R, but you should also learn a bunch even if you've been using R for a while. I'd recommend it to anyone interested in data science, especially since you can read it for free on the web: http://r4ds.had.co.nz
5. You speak a lot about the use of the Rcpp package which makes it simple to connect C++ to R. At a high level, what are the main advantages, and what type of use-cases often gain the most from utilising this package?
Rcpp a great tool for when you have a problem where the the bottleneck is the computer, not the human. Sometimes you can make your R code a lot faster with a little careful thought; if you can do that it's usually worth it. But sometimes making your R code faster involves torturing the code to a point where you can barely understand it. Don't torture your code! Instead, learn a little C++ and switch to a language designed for performance, rather than a language designed for flexibility. I think you can get enough of the basics to be useful in under a week. Start at http://adv-r.had.co.nz/Rcpp.html
6. How do you see R & RStudio evolving in the next 5 years?
I wish I could even predict how the tidyverse would be evolving in the next 3 months, let alone R and RStudio in the next five years! I'm probably a bit too close to the ground to spot the bigger trends. But here are a few ideas:
* We'll continue to see R as a thrive as an interface language that can talk to other specialised systems (like SQL, spark, tensorflow, html/js, ...)
* RStudio will keep working to remove incidental frustrations that make data analysis harder than it should be. This includes things like removing frictions around installing external tools, and setting up reproducible analysis environments.
* The tidyverse will slowly get more consistent across packages so that once you've mastered one package you'll find mastering the others easier. In the near time this involes roling out tidyeval (appearing for the first time in dplyr 0.6.0) out everywhere. I'll also keep working to make explicit the principles that underlie the tidyverse so that others can use them in their own work.
7. What recent developments in the wider industry (i.e. new tools, cloud computing capabilities, deep learning etc) do you find particularly intriguing at the moment?
One really interesting development (somewhat related to deep learning) is that there are many data science challenges that can only be solved with high performance computing in a language like C. But you can't do data science in a language like that: you need something like R or Python. So we're starting to see more tools that explicitly acknowledge that - they implement the high performance stuff in a language optimised for computers and provide friendly wrappers in languages optImised for humans. Some recent examples from the top of my mind are arrow, Stan, and mxnet.
8. You're clearly very talented when it comes to writing, as well as logically compiling and programming code/packages - but what part of your work do you not find quite as natural?
Honestly, writing is still the hardest part of my job, and I wish I could write prose as easily as I write code! But it is fundamentally a harder job: it's easy to tell if a computer has understood what you want; it's much harder to tell if a human has. My best writing is the result of many rewrites, and from talking to people who have read it and who didn't understand something. It's also surprisingly hard to notice the knowledge that you're assuming. For example, I recently realised that R4DS spends a whole chapter discussing data import, but at no point do weactually show you what a csv file looks like, or how to tell if you have one! (Obviously a big hint is the extension, but extensions often lie.)
9. Your Instagram page shows off some pretty nifty cakes and cocktails - any recent creations or discoveries you can recommend?
Nothing too exciting. I've been too busy with other projects :( I continue to love the negroni and all variations. I've recently bought a joule (a sous vide gadget), and I'm excited to use that in my cooking.
10. Lastly, (sneaking three questions into one) - How often do you get back to New Zealand, what do you miss when you're away, and what's your favourite place to be when you're back?
Usually once or twice a year. The thing I miss most from NZ is the food. It's just so easy to get great food everywhere you go! The thing I miss the second most are the beaches so I make an effort to explore beaches in different parts of NZ every time I go back.
We hope you enjoyed the interview and getting to know a bit more about the man who has provided so much to the R community! Please share using the social buttons below and look out for our next edition of SPOTLIGHT!