How to be a great data scientist
Logikk engage exceptional humans for companies looking to unlock the potential of their data
This post was originally posted on Logikk's website here and has been published on analytics-link with their permission
Data scientists are, as most of us already know, in high demand. This, however, does not mean that any old data scientist will do – aiming to be the best in the market should be at the forefront of your, or your company’s, objectives. That being said, directing all your attention on becoming the fabled ‘unicorn’ (the data scientist who is an expert at all hard and soft skills) is not necessary. I want to, therefore, outline how to be a great data scientist.
It is better to be a specialist in one or two specific areas than to aim to master all and end up the ‘master of none’. A data scientist is part of a team, and, from a company’s perspective, it is much more useful to hire a team where each individual has their own specialisms to bring to the table, allowing the whole team to work as a cohesive, collaborative unit.
Unicorns don’t exist, or if they do, they are extremely rare. For companies, this means adjusting expectations accordingly, and for data scientists, focusing on becoming an expert in what you know and love best.
Though mastering every single intricate piece of data scientist is an almost unachievable task, there is an inarguable bank of skills that every data scientist must possess to be great at their job. Mastering all of these is what will set you apart.
Soft Skills And Hard Skills For Data Science Roles
We can divide the skill set required of all data scientists into two key areas: soft skills and hard skills. The soft skills tend to refer to one’s character – the more ‘qualitative’ side of a data science role. The hard skills are the applied, practical skills that you need to know to do the job itself to a good standard.
Hard Skills For Data Scientists
Before we get on to nailing down the person you need to be to handle the uniquely demanding requirements of a data science role, let’s look at what you need to know objectively, to be a great data scientist in the first place.
1. Programming & Statistical Analysis Knowledge
Firstly, you need to know your programming languages. Python (of course), C++, SQL, and Java should be on your tick list, but a willingness to expand your range by adding as many different languages to your arsenal is what will set you apart.
On the statistical analysis front, knowledge of Hadoop and SAS is primary, but your knowledge should not be limited to these. Spark, Pig, R and Hive are also among the most popular. Master all of these to distinguish yourself. If possible, seek to gain certifications in as many of these as you can to bolster your CV and confirm your expertise.
Being able to programme changes what you can do with your statistical analysis knowledge, opening more possibilities and thus broadening the scope of your abilities. These two skills go hand in hand. A thorough understanding of statistics is obligatory, but without the programming skills necessary to apply that knowledge, you only, really, have half a capability to bring to the role.
The analysis of large datasets, which often exceed one million rows, is more important than ever, while the ability to solve problems within your company by creating your own tools is an application of programming knowledge that will come up time and again in a data science role. Approaching a data science career without programming or knowledge of statistical analysis is a pointless exercise.
2. Quantitative Analysis
Being able to understand the behaviour of complex systems by analysing the data a system produces is central to a data scientist’s job. Experimenting with different approaches to a problem, testing out hypotheses, is a vital part of how data scientists explore possible solutions to new or complex roadblocks. Analysing the outcome of these experiments doesn’t always yield accurate results, but a data scientist who can reduce error through quantifiable analysis of an experiment’s outcomes can reveal a deeper insight into what went wrong and what can be done to improve the chances of success in the next iteration.
A data scientist with good quantitative analysis skills will be able to model complicated systems (e.g. supply/demand forecasting, economic optimisation) to ascertain the most valuable routes to follow in a wide range of scenarios. The ability to improve business processes and, ultimately, profit, is an essential skill companies will be keen to harness.
Finally, the use of quantitative analysis to improve the output of machine learning models vastly improves the performance of machine learning and thus its value to a company. By applying her analysis skills to a machine learning model, a good data scientist will be able to streamline work processes across the company, saving time and thus financial cost. The bottom line: quantitative analysis equals efficiency.
3. Unstructured Data Management
Data, of course, come in all shapes and sizes. Some statistics claim that up to 80% of business data is unstructured, in the form of everything from Powerpoint presentations to audio files.
Unstructured data can be challenging to work with and come from a multitude of channels, sometimes simultaneously. Incoming data is rarely neat, so the ability to understand, organise, and clean data is another hard skill (and one that will challenge your soft skills) which no good data scientist can be without.
This is where skills in building bespoke platforms will come in handy, as well as the ability to customise Hadoop and other open-source tools.
Soft Skills For Data Scientists
The technical skills outlined above are the foundations upon which a good data scientist is built. However, without qualities inherent in your character that complement the demanding nature of the role, you will falter. These are known as ‘soft skills’, and refer to the personality traits you must either be born with or have cultivated through experience and determination.
1. Problem Solving
If you boil data science down to the bare bones, you’ll find problems to be solved in the very marrow of the role. Problem-solving is one of those soft skills that a data scientist should be born with. It is unlikely that a person would even choose a role in data science without an instinct to solve complicated problems. If problem-solving doesn’t wet your appetite maybe being a data scientist isn’t for you.
Experimentation with different models and the exploration of diverse ideas will form a large part of your day to day work as a data scientist. The simple fact is that many of these experiments will fail – and that is okay. You must be comfortable with failure; from failure, a data scientist will learn a more effective route to success, and also prepare themselves for future experiments by knowing what did and didn’t work on previous explorations.
As with the above necessity to be resilient against failure, patience to understand and overcome challenges and failures is critical. The patience to trawl through vast reams of data, to spend long swathes of time working on data batches and writing and checking code is a skill that only a particular breed possesses.
Why has the system generated these results? What caused this experiment fail? Why did it succeed? What other data sources can I bring in to enhance the accuracy of my results? A great data scientist questions everything, even themselves. They ask questions even when those questions might sound basic or stupid, to challenge their assumptions and to clarify the specifics of a problem. The opportunity to be continually seeking to solve new and challenging issues and the compulsion to find ever more innovative ways to approach them is what a data scientist thrives on.
The compulsion to innovate in order to find new avenues to success is a symptom of a creative mindset. A data scientist will come across roadblocks regularly, and will always be driven to find creative ways to get around or through any limitations placed in their way. Creativity takes many forms, and technical roles like data science require a certain brand of creativity.
Creativity is also key to effective communication, another skill that is entirely necessary for a data scientist.
Often, a data scientist will be required to communicate their problems, methodology, and findings to other professionals within a company that do not possess the same skill set or understanding as themselves. The ability, therefore, to translate and contextualise their work by finding common ground, communicating in creative ways, by telling stories and using metaphor to allow others to visualise what they are being told, cannot be underestimated.
Equally, it is important to note that no data scientist is an island. Everybody works as part of a team, and the secret to great teamwork is excellent communication. Hoarding knowledge is a big no-no, for it impedes progress to solutions and prevents the team from working as a whole, bringing their own specialisms to a problem in order to solve it.
7. Technical Acumen
This should go without saying. Technical acumen is necessary to master the hard skills discussed above adequately. This skill must come naturally to a data scientist. Best practice in data scientist is in constant evolution. Developments in machine learning, deep learning, and the ballooning wealth of data coming available mean that a data scientist’s technical knowledge must evolve rapidly to keep up. This can only happen with astute technical ability and passion for continuous learning.
Finally, a good data scientist is conscious that their team forms an essential part of a company’s processes, but it is, essentially, just a part. They can, therefore, look beyond their own screen to the bigger picture, how their output hands over to the rest of the company to be put to applied use and eventually to help end users of the company’s products and services. Again, curiosity is necessary here – understanding and interest in the end user will inform the data scientist about how best to conquer tasks set before them.
Departments within any company must co-operate seamlessly so, as, with any role, the ability to thrive in a team environment with spirit and enthusiasm is what sets a proficient data scientist apart from a great one. In this fast-paced, challenging role, only the latter need apply.
Do you have anything to add on how to be a great data scientist? Steve would love to know your thoughts on what you think makes a great data scientist, you can email him at Steve@logikk.com or feel free to tweet him @SteveAtLogikk.