This is the first of a series of interviews by Reuben Pereira on topics related to Data Science as a career from the perspective of data science managers.
Reuben is a Data Scientist in the Advanced Analytics Garage at CARPROOF, he has experience in Machine Learning, Data Mining, Statistics, GIS Systems and Cloud Computing.
Managing Data Science with Amir Feizpour
In this talk we pick the brain of Amir Feizpour, a Toronto based Senior Manager of Data Science at the Royal Bank of Canada (RBC). Amir is also a scientific advisor at Sema Lab. Prior to this, Amir held a postdoctoral position at the University of Oxford and holds a PhD in physics from the University of Toronto.
Can you tell us about your role as the Senior Manager of Data Science at RBC?
Sure, in my role as Senior Manager of Data Science I am primarily responsible for the following:
Developing advanced analytics models based on website and mobile data for RBC Digital Team
Leading community engagement activities of the Enterprise Data Science team. This includes leading internal and external workshops on machine learning, and contributing to the FutureMakers series, a RBC led initiative to engage with local technical community.
Participating in business engagements to start Advanced Analytics Labs
Leading business enablement by providing consulting to businesses about their data and analytics solutions needs
Can you tell us a bit about the about the data science community and market in Toronto?
Toronto has a very large and diverse community of data science and machine learning practitioners and enthusiasts. I often attend professional events and the number of new people I meet never ceases to amaze me. They are all interested to learn and teach and be part of the community! In terms of data science, there is a variety of opportunities such as positions in:
Research Labs at Samsung, LG, Borealis
Startups of various sizes including, Deeplearni.ng, Adeptmind, Flipp and Shopify
Financial institutions such as RBC, Scotiabank, TD, CIBC and BMO, and big companies in other sectors
There are opportunities for all seniority levels, technical skill sets, and aptitudes. For example, there are even companies that do machine learning talent matching like Sharpestminds.
We are going to shift gears a bit and go over a few questions related to your experience in hiring and managing data scientists. Based on your experience, what are the most important quantitative and programming skills you look for when hiring data scientists?
Any person we hire needs to be very comfortable with Python or R, and have at least some basic knowledge of Big Data platforms like Spark and Hadoop. For Python and R, we expect applicants to know and be able to use all the standard packages and libraries. We also expect them to have very good understanding of machine learning techniques, probability theory, and statistics. Given that great communication skills are vital to the role and our business, we believe it is an extremely important skill and a candidate who possesses it will definitely have an edge over the competition. On the whole, the candidate needs to be able to talk about all the steps of the data science pipeline including data extraction, processing, model development and evaluation, and results presentation.
How do you assess if candidates have these quantitative and programming skills?
In terms of assessing candidates, we look at a candidates’ prior experience, and in the interview process we try to ask open-ended questions, and observe the candidate’s ability to think through and communicate their thought process. One typical way people use is to provide a data set, and instruct the candidate to spend a few days to explore the data and develop a model. The candidate will then present and justify their reasoning for using a particular method over another. This is usually a great way to assess candidates because it is a very realistic setting to what aspiring data scientists would encounter in day to day work.
Have most of the candidates you’ve interviewed met these requirements?
Unfortunately no, good candidates are really hard to come across. We often see candidates who have only taken a few online MOOC courses apply for highly technical roles, not realizing that it takes a lot more training to get there. Most of the candidates we interview fail to demonstrate that they can effectively communicate important data science concepts, leading us to believe that they might lack the necessary level of expertise for the position. If candidates want to be better prepared, I believe that they just have to be very serious about data science and immerse themselves in the field. I go into detail about what I mean by this in this post.
If a candidate wants to distinguish themselves, there are a few things they can do. Generally speaking, “generalist” data scientists are a lot more attractive in the job market. By “generalist” I am referring to candidates who have very strong math, statistics and machine learning skills, but have other key skills to complement it. Like I mentioned before, communication is a big one, if you can communicate technical ideas to business stakeholders that a crucial skill. Other examples of important skills are software engineering, business domain knowledge, dev-ops or project management skills.
How do you assess if a candidate has the right amount of prior business acumen?
This one is interesting, because most data scientists don’t really have this, and it is not a vital part of standard data science training. I believe that having prior business experience is definitely a differentiating factor, but what a data scientists brings to the table is their technical and quantitative skill set. This is partly why data science is so popular, because one can change their industry completely, and still be very successful. Usually data scientists work closely with business stakeholders, and each party relies on their strengths to provide actual business value. Therefore I think it’s crucial for data scientists to be able to understand the business need by asking the right questions, and translate analysis results into actionable insights for the business.
In your opinion, what’s the best way for a candidate to gain relevant experience and build a portfolio if they don’t have data science experience?
Working on projects is by far the way to gain experience. If a candidate is debating whether to work on a personal project or take additional coursework or certificates, I would recommend personal projects because it allows candidates to demonstrate what's special about them and their ability to make something work.
For candidates who have personal projects, I pay attention to the completeness of the project, especially the thoroughness of the analysis. They need to provide adequate justifications for all decisions they made, for example, if they are building a model, I want to see justifications for how they dealt with missing values, or how they analyzed the characteristics of the distribution of the data.
If a candidate does not directly have data science experience I think they need to first familiarize themselves with standard data science concepts and terminology. They can easily get this by talking to other data scientists, attending meetups, listening to podcasts and reading books and articles. For candidates who have not worked directly in data science, but have used elements of it in some of their prior work, they should focus on reframing their experiences and highlight these components.
From your perspective as an interviewer, what does a successful interview look like?
An interview where both parties come out of it thinking they had a really good time. This is very important because in addition to assessing a candidate's technical competence, I’m looking to see if this person will be a good fit for the team. So the culture and personality fit is very important.
What are some of the most common mistakes candidates make?
They don’t prepare. They don’t research the company and the team. They don’t have relevant questions to ask, and they don’t ask to meet the hiring manager and the team before accepting the offer.
I recommend candidates learn as much as possible about the interviewer and the team they are applying to join. They need to talk to as many people as they can from the team, research the company, and collect as much information about the position as they can. They need to then formulate a set of relevant questions for the interviewer. By doing this research and preparing the question list, they get a better understanding of the interviewer's perspective and it can even help them figure out what questions to expect. For some of the questions that candidates should ask, see for example this post.
Assuming that a candidate has done their homework, then the only other thing they need to do is stay calm and talk about things that interest them. A last tip, most candidates don’t realize this but they can direct the conversations, especially to emphasize their strength. The tricky part is doing this in a way that keeps the conversation relevant to the interviewer, so this is where prior research and an understanding of the context is very important.
Do you have any other suggestions or advice for aspiring data science or those looking to break into the field?
The best advice I can give to aspiring data scientists is to think and communicate like a data scientist. Focusing on certificates and online courses will only take you so far, they don’t demonstrate what makes you unique or interesting. Take on personal projects and learn how to talk about them. Lastly, when preparing for an interview, have a clear understanding of what the team they are interviewing for does and what their problems are, and demonstrate that you can help them solve these problems with your background and technical expertise.