© 2018 - Analytics Link

Our interview with (the) Hadley Wickham

September 27, 2017

Our first SPOTLIGHT article! We had the pleasure of running 10 questions by Hadley Wickham to get a better understanding of what makes the most famous name in the world of R, tick!

 

For those who know his work, you'll no doubt appreciate how much easier your life is due to his amazingly useful and logical R packages.  

 

For those who don't know the name Hadley Wickham yet, here's a brief intro:

 

Hadley is Chief Scientist at RStudio, and an Adjunct Professor of Statistics at the University of Auckland, Stanford University, and Rice University.  In his own words, "I build tools (computational and cognitive) that make data science easier, faster, and more fun".

 

He is most commonly known for his work creating packages in R.  On his website, he breaks these down into three key categories:

 

Data Science:

  • ggplot2 for visualising data

  • dplyr for manipulating data

  • tidyr for tidying data

  • stringr for working with strings

  • lubridate for working with date/times

 

Data Import:

  • readr for reading .csv and fwf files

  • readxl for reading .xls and .xlsx files

  • haven for SAS, SPSS, and Stata files

  • httr for talking to web APIs

  • rvest for scraping websites

  • xml2 for importing XML files

 

Software Engineering:

  • devtools for general package development

  • roxygen2 for in-line documentation

  • testthat for unit testing

 

On top of this work, he has authored multiple Data Science books (find out more info on his site, here) and there even sites dedicated to his scope of work which has become widely known as the "Hadleyverse"

 

Hopefully, that's enough of an intro!  Let's dive into the interview...

1. Hadley, you're very invested in R these days having spent many years working with it, contributing packages, and authoring books - but when did  you first encounter R and what made it so engaging to you?  

 

I first encountered R in one of my statistics courses during my BSc at  the University of Auckland. I still strongly recall my initial  impression of R. I had a programming background and R did things that  blew my mind. One of the most mind expanding experiences was running  plot(x, sin(x)) and seeing x and sin(x) on the axes. If you've only  used R you take this sort of behaviour for granted, but it was unlike  anything I had ever seen before. The challenge of understanding what  was going put my feet on the road to the depths of R. 


2. In this industry it can sometimes be hard to put into words, what it is  that we do exactly. When you're at a BBQ (a quick look at your Instagram  suggests you're a fan...) and someone asks "So Hadley, what do you do?" -  how do you respond? 


I don't have a great answer for this. If the person is somewhat  technically savvy, I'll say I develop open source software for data  scientists (and if I'm in NZ, I'll also mention that I do this in R).  Otherwise, I say I do statistics and programming. That's usually  enough to kill the conversation :P 


3. What does a typical day of work look like for you right now - what are  you working on?

 
I'm currently finishing up the dplyr 0.6.0 release which I've been  spending a lot of time on lately. dplyr is a popular package so  there's a huge amount of work that goes into ensuring that we haven't  introduced new bugs, and making sure that if we've deliberately change  how a function works that everyone knows about it and how to update  their code. 
 Now that I have a team at RStudio (Gabor, Jenny, Jim, Lionel, and Max)  more of my week is taken up with management. Fortunately they're all  awesome so it's mostly making sure that they're not blocked and  helping to set priorities across the whole group. However, I still  love indepedent programming and writing and want to do plenty of it -  I now avoid meetings on Tuesdays and Thursdays so I have enough time  to deeply immerse myself in technical challenges.


4. Your latest book R For Data Science was published earlier this year, what  does it cover and who might benefit from grabbing a copy?  

 

R4DS is my attempt to explain the key tools that allow you to do data  science in R. It describes the tidyverse, a collection of packages  designed to work well together, and make data analysis more efficient,  more expressive, and more fun. The book should be accessible to the  newcomer to R, but you should also learn a bunch even if you've been  using R for a while. I'd recommend it to anyone interested in data  science, especially since you can read it for free on the web:  http://r4ds.had.co.nz


5. You speak a lot about the use of the Rcpp package which makes it simple  to connect C++ to R. At a high level, what are the main advantages, and  what type of use-cases often gain the most from utilising this package?  

 

Rcpp a great tool for when you have a problem where the the bottleneck  is the computer, not the human. Sometimes you can make your R code a  lot faster with a little careful thought; if you can do that it's  usually worth it. But sometimes making your R code faster involves  torturing the code to a point where you can barely understand it.  Don't torture your code! Instead, learn a little C++ and switch to a  language designed for performance, rather than a language designed for  flexibility. I think you can get enough of the basics to be useful in  under a week. Start at http://adv-r.had.co.nz/Rcpp.html

 

 

6. How do you see R & RStudio evolving in the next 5 years?  

 

I wish I could even predict how the tidyverse would be evolving in the  next 3 months, let alone R and RStudio in the next five years! I'm  probably a bit too close to the ground to spot the bigger trends. But  here are a few ideas:

 
* We'll continue to see R as a thrive as an interface language that can talk  to other specialised systems (like SQL, spark, tensorflow, html/js, ...)  

 

* RStudio will keep working to remove incidental frustrations that make data  analysis harder than it should be. This includes things like removing  frictions around installing external tools, and setting up reproducible  analysis environments. 


* The tidyverse will slowly get more consistent across packages so that once  you've mastered one package you'll find mastering the others easier. In  the near time this involes roling out tidyeval (appearing for the first  time in dplyr 0.6.0) out everywhere. I'll also keep working to make explicit  the principles that underlie the tidyverse so that others can use them in  their own work.


7. What recent developments in the wider industry (i.e. new tools, cloud  computing capabilities, deep learning etc) do you find particularly  intriguing at the moment? 

 

One really interesting development (somewhat related to deep learning)  is that there are many data science challenges that can only be solved  with high performance computing in a language like C. But you can't do  data science in a language like that: you need something like R or  Python. So we're starting to see more tools that explicitly  acknowledge that - they implement the high performance stuff in a  language optimised for computers and provide friendly wrappers in  languages optImised for humans. Some recent examples from the top of  my mind are arrow, Stan, and mxnet.


8. You're clearly very talented when it comes to writing, as well as  logically compiling and programming code/packages - but what part of your  work do you not find quite as natural?


Honestly, writing is still the hardest part of my job, and I wish I  could write prose as easily as I write code! But it is fundamentally a  harder job: it's easy to tell if a computer has understood what you  want; it's much harder to tell if a human has. My best writing is the  result of many rewrites, and from talking to people who have read it  and who didn't understand something. It's also surprisingly hard to  notice the knowledge that you're assuming. For example, I recently  realised that R4DS spends a whole chapter discussing data import, but  at no point do weactually show you what a csv file looks like, or how  to tell if you have one! (Obviously a big hint is the extension, but  extensions often lie.) 


9. Your Instagram page shows off some pretty nifty cakes and cocktails - any  recent creations or discoveries you can recommend? 


Nothing too exciting. I've been too busy with other projects :( I  continue to love the negroni and all variations. I've recently bought  a joule (a sous vide gadget), and I'm excited to use that in my  cooking. 


10. Lastly, (sneaking three questions into one) - How often do you get back  to New Zealand, what do you miss when you're away, and what's your favourite  place to be when you're back? 


Usually once or twice a year. The thing I miss most from NZ is the  food. It's just so easy to get great food everywhere you go! The thing  I miss the second most are the beaches so I make an effort to explore  beaches in different parts of NZ every time I go back. 

We hope you enjoyed the interview and getting to know a bit more about the man who has provided so much to the R community!  Please share using the social buttons below and look out for our next edition of SPOTLIGHT!

 

 

Tags:

Share on Facebook
Please reload

Please reload

Recent Posts