This is the second of a series of interviews by Reuben Pereira on topics related to Data Science as a career from the perspective of data science managers.
Reuben is a Data Scientist in the Advanced Analytics Garage at CARPROOF, he has experience in Machine Learning, Data Mining, Statistics, GIS Systems and Cloud Computing.
For the first interview, with Amir Feizpour - click here!
Communicating Data Science with Shane Dejong
In this talk we interview Shane Dejong, the Head of Data Science at the CARPROOF Product Innovation Lab in Kitchener, Ontario. Shane holds a Master’s of Science in Predictive Analytics from Northwestern University, as well as a CPA (Chartered Professional Accountant) designation.
Can you tell us about your role as the Head of Data Science at CARPROOF?
As the Director of Data Science, I’m focused on finding ways to improve our business and products with data science and machine learning. I work to ensure that my team has a clear path and the right resources to develop great data science products in a timely fashion. I provide the high level oversight for my team to ensure that they are taking the right approaches and what they're doing is right for the intended application.
Can you tell us a bit about the about the data science community and market in the Kitchener-Waterloo (KW) region?
The KW area is one of the fastest growing tech hotspots in Canada. There are many big tech companies in the area, starting with Blackberry in the early 2000’s, we now have other major players like Google, Facebook and Shopify.
In terms of the data science market, I’m seeing that companies are looking for candidates with data science and statistics skills as a standard, even for positions that are not directly data science focused, like software developers. So with this, most software developers are looking to develop their statistics and machine learning skills, while traditional statisticians or mathematicians are looking to develop their software development skills. So, I see the blurring of the lines between software engineering and analytic skill sets and an individual who possesses both of those skills will be highly desirable going forward.
We are going to shift gears a bit and go over a few questions related to your experience in hiring and managing data scientists. Based on your experience, what are the most important quantitative and programming skills you look for when hiring data scientists?
When hiring data scientists I typically start by looking at individuals with a quantitative background like statistics, computer science, applied mathematics, physics or business and economics (quantitative focus). Given the quantitative nature of the role, I’m looking for individuals who tend to be mathematically inclined. If an individual does not have a graduate degree in a quantitative field, relevant work experience is also helpful.
In terms of their technical skill sets, individuals need to have experience in using R or Python for data science and experience in working with data from structured and unstructured sources at a bare minimum. In addition, being able to understand, explain and implement standard statistical techniques and machine learning models is also essential. These are typically the bare essentials I look for.
How do you assess if a candidate has these quantitative and programming skills?
I’ve found that the best ways to assess technical skills is with case-study type questions. I’ll typically present the candidate with a scenario where there is a technical or algorithmic challenge, and have them explain how they would solve the challenge. By doing this I can assess how the candidates responds and really gauge their technical maturity.
In addition to this, I am interested if a candidate is passionate about data science. As an example, I typically ask them about their hobbies and what they like doing in their spare time. If they demonstrate they are involved in Kaggle competitions, follow most of the popular data science blogs or have their own personal projects, I can see that the individual has a passion for the subject. I think this is important in order to stay up-to-date and relevant.
When we get towards the final stages of the interview process, we present the candidates with a case study. The candidate is typically given some time to work on the problem and has to present the results and methodology. This gives me an opportunity to assess the candidates thought process and level of maturity in detail.
Have most of the candidates you’ve interviewed met these requirements?
I think generally a mixed pool. With more an more institutions offering these types of programs there are more graduates with these skills but finding candidates with relevant experience to go along with these skills is harder to find.
How do you assess if a candidate has the right amount of prior business acumen?
This is generally a tough ask from a candidate if they don’t actually have a few years of experience working in a similar role. What I do look for however is someone who can connect the dots. What I mean by that is I’m interested if you can explain to a layman what's the model you built, why you built it, what is it going to be used for, and what’s its impact on the company. I’m looking for candidates who can clearly and articulately explain their work and its impact, to both technical and non-technical stakeholders.
That makes complete sense, and also as a data scientist you can’t work in a silo, going into meetings, working with subject matter experts, you understand what's going with the data itself, because there can be data anomalies or important data dictionaries where you have to actually work with someone to know what's going on.
Yes, that’s why I think it can be challenging for candidates who come from a purely academic background, where they’ve only been involved mostly individual research. It’s a big change to go from spending 5 years working on a tough math problem or optimizing code to working with cross functional teams with multiple competing priorities.
In your opinion, what’s the best way for a candidate to gain relevant experience and build a portfolio if they don’t have data science experience?
This is an interesting point because not only is it tough for people to get relevant work experience in data science, but it's also hard for companies to find people who have the relevant experience. In that sense, we are really fortunate to have a lot of free online resources like Kaggle or Open Data initiatives that candidates can use to build or augment their portfolio.
Okay so in your opinion if a junior or aspiring data scientist doesn’t have any prior experience in the field, if they have participated in a variety of Kaggle competitions and have shown that they can not only model but also do a better data prep work, do you think this could be used to help build relevant experience?
I think it’s a fairly good resource, a lot of these Kaggle competitions are not always easy and can take a lot of time and effort to complete. If the candidate can successfully complete the project and explain it in a way that a non-technical person could understand, I think that’s great. Another thing I would say to do would be to get more involved in the DS community. This could be doing more competitions, going to Meetups or attending workshops on data science topics.
From your perspective as an interviewer, what does a successful interview look like?
I’m looking for candidates to articulately explain their prior projects, and the reasoning behind the steps they took for these projects. If they ran into a challenging situation and can explain how they found or implemented an innovative solution that's always a plus.
What are some of the most common mistakes candidates make?
Exactly that, not being able to backup and articulately explain their experiences. For example, if walking through a candidates past experiences and they are vague about why they chose one model over another or how they handled data issues, that can be a red flag. Another thing that makes it hard for me to assess a candidate is if they don’t provide clear examples that demonstrate their skills.
In regards to preparing for the interview, I think the most important thing is to actually do some research and understand the company and the industry. Be curious about how this company uses data science or research their data science products. Research the competitors and find out about what they are doing. Go into the interview with a clear proposal of how you can help the company achieve their goals for data science. Lastly, don’t be afraid to present you own ideas, if you have an idea that might help be sure to talk about it.
Do you have any other suggestions or advice for aspiring data science or those looking to break into the field?
Always be curious and willing to learn. Data Science is a rapidly changing speciality where new technologies and techniques are being developed on a weekly basis. So stay current and up-to-date with as much as you can and always be curious and willing to learn.