Chapter 4
The Data Scientist’s Mindset

People tend to have a very superficial view of what a data scientist is (if they can even distinguish the term from the data analyst or from the traditional scientist). This is clearly reflected in the books and articles that are available today on this role18. Rarely will you find a text that attempts to go deeper into what a data scientist really is.

A data scientist is a person characterized by a particular set of traits, qualities, a way of thinking, and ambition, just like every profession, not just by a set of skills. Let us look at each one of these key aspects of this mindset one by one in order to obtain a better understanding of it and create a framework about what being a data scientist really is.

4.1 Traits

A data scientist has a variety of professional characteristics and traits that usually reflect the kind of work he specializes in, so this list is not set in stone and is more of a guideline to understand this role better. First and foremost, a data scientist has a healthy curiosity about the things he observes, such as potential patterns or relationships between two attributes or features, unusual distributions, etc. If you want to be a data scientist worth the money you earn, you need to have an inquiring mind.

This does not mean that you need to be curious about everything and get lost in perpetual random quests for answers. Curiosity has to be accompanied by the discipline to focus on down-to-earth, long-term interests that are more grounded than a fleeting curiosity, which can be impulsive and superficial. A data scientist is interested in the phenomena he observes in the data he deals with, wanting to get to the bottom of them. A statistical analysis of what’s there may be a good first step for him, but he is not satisfied until he has a good answer for the reason of these phenomena, the root cause behind the statistical metrics he calculates. This allows him to explain the root cause to other people in the company in the form of a story.

Fig. 4.1 Curiosity is a very useful trait to have as a data scientist.

This leads to another trait that is somewhat akin to curiosity: an interest in experimentation. Namely, the data scientist has the courage and the imagination to try out new things, develop new ideas and put them into practice, design experiments and validate new notions that he develops. He is not afraid to build a model that no one else has built before, always being fully aware of the risks in terms of resource usage, etc. All this is a disciplined and practical form of experimentation where the ideas stem from the data available. Otherwise, there is the risk of misusing the data to project notions that are not there, a common mistake among data analysts lacking scientific discipline in their work. Experimentation is crucial, though, because it allows the data scientist to find new ways of interpreting the data and helping it transmute into information that can be useful to other people. This is an important point. The output of the experiments needs to be understandable to the non-technical members of his team; otherwise, it is probably immature. So experimentation is applied on many levels. Representation of results is one of them, and although not the most intellectually challenging to the data scientist, it is definitely no less important than the other tasks he undertakes.

Other traits that the data scientist has are creativity and systematic work. These are mentioned together because they are often applied together in data science and are equally important. The data scientist is an artist of sorts, in the sense that he is involved in design and other creative endeavors in his line of work. He values out-of-the-box thinking and regularly applies it to the problems he tackles. Although knowledgeable in various data analysis methods, he is not restricted by this palette of methodologies. Instead, he may use a combination of them, or even something completely new, tailored for the particular problem he faces. This is an important aspect of the data scientist that distinguishes him from a traditional data analyst and statistician. Creativity goes hand in hand with experimentation, making it an organic growth approach to tackling problems. Without creativity, experimentation may not quickly lead to results (think of a scientist researching a treatment for a disease; without creativity he may have to spend a large amount of time and other resources trying out potential solutions, many of which he could avoid testing altogether by applying a more creative and efficient approach). Creativity is, therefore, invaluable to the data scientist and a fundamental aspect of his thinking.

The data scientist is not, however, an artist per se. That’s why every creative thought he has is accompanied by several not-so-creative actions. This is where systematic work comes in. Think of any inventor (e.g., Thomas Edison) and how many hours of often tedious work were spent on honing and applying their creative insight. In a sense, having a particularly creative idea is not all that difficult. Finding one that is applicable to a given problem is very creative but still not too challenging, either. However, putting this idea into practice, working out all the engineering details of it, and getting useful results in a manageable timeframe: that is a real accomplishment. This is feasible through systematic work, which is not just hard work, but work done in a methodical and efficient way, something typical of any type of scientific endeavor. The trait of working systematically expresses itself as the discipline, organization and rhythm through which the data scientist manages to ground the creative ideas he comes up with.

Last, but certainly not least, of the essential skills of the data scientist is that of communication. Data science is not an ad hoc field. It is an interdisciplinary one, and, as such, it is closely connected to other fields. In the data scientist role, this translates into a series of connections or collaborations with other professionals in the organization. These professionals are usually in a variety of specialties and may have a different understanding of the various levels of the information the data scientist deals with. For the data scientist to be good at his role, he needs to be able to explain not only his methodology and results to his colleagues and his managers, but also the value of the whole process. It is this connectivity to other people that gives value to the data scientist’s role. You don’t do data science on your own (unless you are just practicing). Besides, requirements and problem parameters are not always clear cut, needing to be defined through a process of interviews with other professionals in the organization as well as communiqués with middle/upper management. The data scientist needs to be able to not only communicate his results, but also understand clearly what is expected of him and engage in a constructive conversation to determine the best possible parameters of the projects he undertakes. He needs to be able to manage the ensuing expectations and make sure that others, especially those in managerial roles, see practical value in what he can provide (without expecting miracles).

Although there are other traits that a data scientist may have, the traits described above are the most essential ones for a good data scientist. When applied with discernment and intelligence, they can help his role develop organically and effectively.

4.2 Qualities and Abilities

Hand in hand with traits are the qualities and abilities of a data scientist, which often depend on his particular specialty. However, there are certain ones that are found in every type of data scientist. The most important of these are the following:

Model Building. This is a fundamental ability of a data scientist, involving the design and implementation of mathematical models that can be used to solve the data-related problems he is asked to tackle. Stemming from the need to scientifically explain and predict certain phenomena that are reflected in the available data, model building is a key skill for every data scientist. This is also one of those things that differentiate him from most statisticians and the majority of data professionals. It involves understanding and creativity as well as a great deal of imagination. The models built by a data scientist are implemented in an interactive environment, so it goes without saying that a certain amount of programming also takes place and that the models created take into account the available resources, using them in a very effective and efficient manner.

Building a model, though, is not that easy. The model has to be as simple as possible without being too simple. For example, a simple model may be able to predict how many people will attend a football match based on how large the fan clubs of the participating teams are, the expected weather on that day, and the time of the year, while an overly simple model may try to predict the same thing using only one of these features. The model has to be able to generalize so that it can predict a lot of different cases that may not be entirely akin to the ones that were used to create it. It has to be easy to change and understood by everyone who uses it, especially those who may need to fine-tune it. Model building can be based on mathematics, a computational algorithm, or, more often, a combination of both. The key thing is efficiency, so this is something that the data scientist needs to factor into the whole process. What good would a perfect model be if it took weeks to provide any results, or if it required a huge number of computers to run it? Also, it goes without saying that a data scientist needs to be able to evolve and fine-tune his models, customizing them to different circumstances and adapting them to the data when it changes.

Planning. This is an obvious quality for anyone in the data-related professions, but it is especially useful for a data scientist as it is very easy to get carried away with analyzing the available data, experimenting with various models, and not dedicating sufficient time for other tasks such as documenting the process and the results or creating the corresponding visuals, comprehensive presentations and reports. In addition, a data scientist needs to be able to factor in potential delays, technical issues, communication lags, etc. in order to make sure that he can meet all the deadlines of the projects he undertakes. He needs to be able to think like a project manager and have a practical approach to assessing time durations of different tasks and plotting a realistic and efficient plan of action for all the projects he undertakes.

Problem Solving. This is a key quality for any scientist, particularly a data scientist; it involves being able to focus on solutions rather than on the restrictions that a problem presents. Often, the data scientist has not encountered these solutions before, so it requires a certain amount of imagination and creativity. It means being able to look at the problem at hand from different angles, with different eyes and an open mind.

Problem solving often involves finding ways to hack existing technologies to work around a problem. Data science is rarely clearly defined (similar to most academic endeavors), and every problem it deals with is unique. That’s why a data scientist is often more akin to the hacker than the scientist as he may have to tackle problems through a lateral thinking approach (see next subchapter for details) and walk outside the beaten path. Also, he may need to develop new tools for tackling the problems he faces (i.e., making sense of chaotic big data), building code from scratch or doing major modifications to the existing code.

Learning Fast. Being able to learn new things and learn them fast is a priceless quality for any profession. However, in a field with constant and rapid changes such as data science, it is particularly useful. It also attests to mental agility and promotes creativity, both invaluable aspects of the mindset suitable for someone who wants to tackle big data problems. Learning fast means being very methodical, selective, and able to assess different sources of knowledge. It requires great discipline and mental plasticity. Almost anyone can learn like that at a relatively early age, but being able to maintain this openness throughout adulthood is a challenge to most people. A data scientist accepts this challenge and does not let age dictate what he can or cannot learn, nor how fast he can do so. His disciplined and nimble mind makes sure of that.

Key elements for learning fast are motivation and being able to perceive the applicability of new material. If you keep this in mind, it will be easier to develop this ability and use it effectively in your journey as a data scientist.

Adaptability. An essential quality for a data scientist is the ability to adapt to new circumstances and new situations. In a way, data science is like a safari; you have some idea of what the available game is, but you don’t know when or where you’ll find it. One thing is certain: you will need to be versatile and able to find ways to adapt your know-how and techniques to the (often unique) data problems you will be asked to tackle. Also, the methods you are going to use may need to be adapted to be capable of handling the form of the available data. This quality also enables the data scientist to work in different industries without being restricted to any one in particular. After all, the language of information is universal.

Teamwork. As mentioned earlier, a data scientist needs to be able to communicate and collaborate effectively with other professionals who are often not familiar with his field. He needs to be a good team player and not let the uniqueness of his role corrupt his character, turning him into a person full of himself and incapable of working well with others. A data scientist is assertive when it comes to presenting and defending his work but is also modest and open to new ideas from his colleagues. He puts the interest of the team before his own interests and is secure enough to not always have to prove himself. He is intelligent enough to figure things out on his own, but also mature enough to see that through brainstorming and collaboration, he can arrive at the same (or better) results significantly faster. An independent professional, he is disciplined enough to work on his own but is also a good contributor to team meetings and is easy to work with.

Flexibility. Flexibility is another important quality for a data scientist to have. It is akin to adaptability and the mental agility mentioned earlier. It enables the data scientist to be versatile and non-rigid when dealing with data problems, technical issues and other obstacles (challenges) in his work. Flexibility is crucial when it comes to new problems or new data structures that have never been encountered before. It hones a can-do attitude that allows him to deal with novel situations effectively, efficiently and creatively. This simple quality is the glue that ties all the other qualities and abilities together, enabling the data scientist to organically evolve his techniques and even his thinking, thus making him an invaluable asset to his organization.

Research. This has nothing to do with academic research, although it is scientific in essence. The data scientist is able to understand and evaluate the current state of the art in his field and find all the knowledge resources that are required for his tasks. This entails more than looking things up on a search engine or a knowledge base, though. Finding quality sources is crucial for tackling the challenging problems of big data, and it requires a trained eye to see which methods are applicable and efficient when applied to a specific problem. It also entails putting together documents describing new methods he develops in a concise, scientifically robust and replicable way. Whether or not these documents are publishable is another matter and not related to how useful the described techniques are.

This ability ties very well with learning fast as it enables the data scientist to be self-sufficient when it comes to learning. In addition, it makes it possible for him to train others as well as have something to share at data science conferences and other relevant events if he so chooses. Needless to say, it is particularly vital in the initial stages of his career, especially if he has a disposition towards innovation.

Attention to Detail. A data scientist needs to be attentive to details since that is usually where useful information lurks. Also, a small detail may cause syntactical or, even worse, logical errors in his programs, slowing him down and compromising his deadlines. Apart from the efficiency boost, this ability is very useful in other ways as well. For example, certain details in the available data may hint at using one or another data analysis approaches or towards a particular set of features that could simplify the problem significantly, even improving the results of the analysis. Also, attention to detail can help the data scientist pinpoint anomalous data points in a data set, enabling him to predict problems in advance.

Reporting. Last, but certainly not least, reporting is a useful ability for a data scientist. It entails creating documents that summarize his work, creating visuals that depict his results, putting these results in perspective, creating comprehensive presentations, etc. A data scientist’s reports need to be understandable by non-technical people while still maintaining scientific rigor. Also, reporting provides a way to document progress on various projects in an easy-to-access manner. Reporting employs organization and communication and forges a link between the data science world and the business world.

Other qualities and abilities may be required for certain specialties of a data scientist, but having the ability to learn new things may compensate for any lack of skills that you may have.

Fig. 4.2 A data scientist is not your average IT professional.

4.3 Thinking

The data scientist’s way of thinking is the most important attribute to keep in mind since it often distinguishes him from other types of professionals. In general, a data scientist thinks in a combinatorial, non-linear way. His thinking needs to combine both traditional and lateral thinking and be versatile in employing either pattern when dealing with the challenges that arise in his work.

His thinking is creative when it comes to designing and implementing his models or investigating which approach should be used for tackling a particular problem. His thinking is not bound by unnecessary restrictions when creating or updating the algorithms he decides to use for his data analysis. In that sense, his thinking often resembles that of an artist, a designer and an architect. He does not hesitate to experiment with different approaches and methodologies and is poised to try out different ways to visualize the available data insightfully. Colors and shapes are his tools and can be as applicable as numbers in expressing the information that is waiting to be discovered. In a way, his thinking is very similar to that of the explorer who sets out to find new lands, but his realm is the vast seas of data in the cyberspace universe.

A data scientist’s thinking is also grounded and practical, especially when it comes to building something with limited resources in a constrained timeframe. In that sense, it is similar to the thinking of a civil engineer who opts to make the most of the available space and budget without dwelling much on fancy designs. Just like a civil engineer, a data scientist does not neglect the given requirements and tailors his creative approach to the restrictions of the task at hand. Perhaps he could derive ten or fifteen different metrics from a given dataset to monitor the evolution of a given variable, but he only needs four or five of them. And from the dozens of beautiful graphs he could create to depict that dataset over time, he picks only a couple that summarize it most effectively. A data scientist is also an engineer of sorts and always thinks and behaves in a pragmatic and down-to-earth manner.

The data scientist’s thinking is also self-reflective and, in a way, meta-cognitive. He investigates different ways of thinking about things and evaluates his current thinking processes. In essence, a data scientist should be aware of how his mind works and, therefore, be willing to admit to gaps in knowledge (and do something about them). He continually looks for flaws in his own methods and takes the necessary steps to fix them. He is proactive and takes responsibility for how his mind functions and the inputs it uses. He is not afraid to say that he doesn’t know something and makes every effort to acquire the relevant resources to help him understand it sufficiently and quickly. This allows him to be a better team member and greatly facilitates communication with others.

Most importantly, the mind of the data scientist evolves over time. Modern neuroscience confirms the brain’s life-long ability to change and create new connections within itself. The thinking of the data scientist today is not the same as it was last year, and it is not going to be the same next year. His mind embraces change and uses it to upgrade itself through new experiences, new knowledge and new know-how. In some professions, it may be sufficient to have more or less static thinking, but data science is not one of them. The data scientist is similar to the entrepreneurs, the managers and the inventors, continuously learning new things and adapting his thinking to the ever-changing circumstances of our fast-paced world.

Of course, the thinking of a data scientist is not limited to the above meta-descriptions, and a book subchapter may not be capable of doing it justice. The above guidelines do, however, pinpoint some of its main aspects and hopefully provide incentive for looking into it in greater depth through a conscious evaluation of your thinking as you learn more about data science in general.

Fig. 4.3 Thinking is an important aspect of the data scientist’s mindset.

4.4 Ambitions

It seems a bit unconventional for a book like this to talk about a professional’s ambitions as this is something that is very personal and somewhat relative. However, there are certain aspirations that are more or less common to data scientists; understanding them may provide useful insight into his mindset.

A data scientist aspires to master big data in its many forms. Being able to deal with a particular data set in this domain is great, but often not enough. Someone who cares for data science finds ways, often through interaction with other professionals in this field, to be on top of the data that is out there, meaning that he comprehends fully what each data type can offer to an organization, what useful information he can potentially derive from it and what costs acquiring each data type entails. This stems from the dream of continuous improvement, which is quite feasible in fields like this where more and more tools become available as new data analysis methods are developed all the time.

Data scientists also constantly want to learn new things. This wish ties quite well with the previous ambition of mastering big data since learning, especially when related to diverse things that include the realm of big data, has been proven to aid in the development of creativity and mental agility. These are essential aspects of the role of the data scientist, and cultivating them makes perfect sense. A data scientist’s interests are not limited to the data science techniques that he may use in his everyday work. He is also interested in new developments in artificial intelligence, distributed computing, information security, new programming languages and machine learning, among other fields.

Fig. 4.4 A data scientist is not without ambitions.

Finally, a data scientist aspires to familiarize himself with the open problems and challenges that exist in the big data world as well as the opportunities that are available through the intelligent processing of company data. He may want to research new ways of tackling problems through the use of new technologies, development of new methods, etc., or he may look into how specific business requirements can be fulfilled through the use of certain kinds of data that are available or can be acquired in a cost-effective manner.

The aforementioned ambitions are examples of what a data scientist wants professionally. The bottom line is that he is not static, but always wanting to be more than what his job description implies. If you keep this in mind, along with the other aspects of his mentality, you will have a clearer understanding of the mindset required for this intriguing role. As a result, you will be able to place the specific skills and knowledge that he has in context with his work and have a more holistic view of this profession.

4.5 Key Points

  • The most important traits a data scientist has are:
    • Curiosity
    • Experimentation
    • Creativity and Systematic Work
    • Communication
  • The main qualities and abilities of a data scientist are:
    • Model Building
    • Planning
    • Problem Solving
    • Learning Fast
    • Adaptability
    • Teamwork
    • Flexibility
    • Research
    • Attention to Detail
    • Reporting
  • A data scientist aspires, among other things, to:
    • Master big data in its many forms
    • Constantly learn new things
    • Familiarize himself with the open problems and challenges that exist in the big data world as well as the opportunities that are available
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset