Chapter 14
Future Trends and How to Remain Relevant

Data science is a dynamic field; it is constantly changing. Therefore, keeping up with new developments is not just advisable, it is also expected (and necessary in staying relevant to employers and clients for that matter). Otherwise, your know-how is bound to become obsolete sooner or later, making you a less marketable professional. In order to avoid this, it is important to learn about the newest trends and have strategies in place about remaining relevant in this ever-changing field.

In this chapter, we will examine general trends in data science that are bound to affect it in the coming decade. This includes the role of AI, the future of big data, new programming paradigms, and the rise of Hadoop alternatives. In addition, we will look at ways to remain relevant in data science, such as the versatilist approach, data science research, continuously educating yourself, collaborative projects, and mentoring. This chapter may not guarantee that you will become future-proof, but it will help you to be more prepared so that you ride the waves of change instead of being swallowed by them.

General Trends in Data Science

Even though data science is a chaotic system and the many changes it experiences over time are next to impossible to predict, there are some general patterns, or trends, that appear to emerge. By learning about the trends of our field, you will be more equipped to prepare yourself and adapt effectively as data science evolves.

The Role of AI in the Years to Come

Apart from the hype about AI, the fact is that AI has made an entrance in data science, and it is here to stay. This does not mean that everything in the future will be AI-based, but it is likely that AI methods, like deep learning networks, will become more and more popular. It is possible that some conventional methods will still be around due to their simplicity or interpretability (e.g. decision trees), but they will probably not be the go-to methods in production.

Keep in mind that AI is an evolving field as well, so the methods that are popular today in data science may not necessarily be popular in the future. New ANN types and configurations are constantly being developed, while ANN ensembles have been shown to be effective as well. Always keep an open mind about AI and the different ways it applies to data science. If you have the researcher mindset and have the patience for it, it may be worth it to do a post-grad program in AI.

Big Data: Getting Bigger and More Quantitative

It may come as a surprise to many people that big data is getting more quantitative since the majority of it is comprised of text and other kinds of unstructured data, that’s not harnessed yet, often referred to as dark data. However, as the Internet of Things (IoT) becomes more widespread, sensor data becomes increasingly available. Although much of it is not directly usable, it is quantitative, and as such, capable of being processed extensively with various techniques (such as statistics).

In addition, most of the AI systems out there work with mainly quantitative data (even discrete data needs to be converted to a series of binary variables in order to be used). Therefore, lots of data acquisition processes tend to focus on this kind of data to populate the databases they are linked to, making this kind of data more abundant.

As for the growth of big data, this is not a surprise, considering that the various processes that generate data, whether from the environment (through sensors) or via our online activities (through web services), grow exponentially. Also, storage is becoming cheaper and cheaper, so collecting this data is more cost-effective than ever before. The fact that there are many systems in place that can analyze that data make it a valuable resource worth collecting.

New Programming Paradigms

Although Object-Oriented Programming (OOP) is the dominant programming paradigm at the moment, this is bound to change in the years to come. Already some robust functional languages have made their appearance in the field (see Chapter 4 for a recap), and it is likely that languages of that paradigm are not going away any time soon. It is possible that other programming paradigms will arise as well. It would not be far-fetched to see graphical programming having a more pronounced appearance in data science, much like the one featured in the Azure ML ecosystem.

Regardless, OOP will not be going away completely, but those too attached to it may have a hard time adapting to what is to come. This is why I strongly recommend looking into alternative languages to the OOP ones, as well as bridge packages (i.e. packages linking scripts of one language to another).

In addition, if you are good at the logic behind programming and have the patience to go through its documentation, any changes in the programming aspect of data science shouldn’t be a problem. After all, most new languages are made to be closer to the user and are accompanied by communities of users, making them more accessible than ever before. As long as you take the time to practice them and go through code on particular problems, the new programming paradigms should be an interesting endeavor rather than something intimidating or tedious.

The Rise of Hadoop Alternatives

Even though Hadoop has been around for some time, there are other alternatives in the big data arena. Lately, these big data governance platforms have been gaining ground, leaving Hadoop behind both in terms of speed and ease of use. Ecosystems like Microsoft’s Azure ML, IBM’s Infosphere, and Amazon’s cloud services, have made a dent in Hadoop’s dominance, and this trend doesn’t show signs of slowing down.

What’s more, there are several other systems nowadays that are on the software layer above Hadoop and which handle all the tasks that the Hadoop programs would. In other words, Hadoop’s role has diminished to merely offering its file system (HDFS), while all the querying, scheduling, and processing of the data is handled by alternative systems like Spark, Storm, H2O, and Kafka. Despite its evolution, Hadoop is getting left behind as an all-in-one solution, even if it may still remain relevant in the years to come as a storage platform.

Other Trends

Beyond the aforementioned trends, there are several other ones that may be useful for you to know. For example, there are several pieces of hardware that are becoming very relevant to data science, as they largely facilitate computationally heavy processes, such as training DL networks. GPUs, Tensor Processing Units (also known as TPUs, http://bit.ly/2rqk2bU), and other hardware are moving to the forefront of data science technology, changing the landscape of the computer systems where production level data science systems are deployed.

Also, with parallelization becoming more accessible to non-specialists, it is useful to remember that building private computer clusters may be easier than people think, as it is cost-effective to buy a bunch of mini-computers or even tiny-computers (e.g. Arduinos) and connect them in a cluster array. Of course, with cloud computing becoming more affordable and with it being easier to scale, it could be that the clusters on the cloud trend may continue as well.

There are also new deep learning systems, such as Amazon’s MXnet, making certain AI systems more accessible to non-experts. A trend like this is bound to become the norm, since automation is already fairly commonplace in a variety of data science processes. As we saw earlier, AI is here to stay, so new deep learning systems may be very popular in the future, especially ones that incorporate a variety of programming frameworks.

Remaining Relevant in the Field

Remaining relevant in data science is fairly easy once you get into the right mindset and allow your curiosity and creativity to take charge. After all, we are not in the field just because it is a great place to be, but also because we are interested in the science behind it (hopefully!) and care about how it evolves. Understanding the trends of data science today may help in that as it can enhance our perspective and urge us to take action along these trends.

The Versatilist Data Scientist

There are some people who specialize in one thing, also known as specialists, and there are others who know a bit of everything, though they do not have a particularly noteworthy strength, also known as generalists. Both of these groups have their role to play in the market, and there is no good or bad between the two. However, there is a group that is better than either one of them, as it combines aspects of both: the versatilists.

A versatilist is a (usually technical) professional who is good at various things and particularly adept at one of them. This enables him to undertake a variety of roles, even if he only excels in one of them. Therefore, if someone else in his team has trouble with his tasks or is absent for some reason, the versatilist can undertake those tasks and deal with whatever problem comes about. Also, such a person is great at communicating with others, as there is a lot of common ground between him and his colleagues. This person can be a good leader too, once he gains enough experience in his craft.

Being a versatilist in data science is not easy, as the technologies involved are in constant flux. Yet, being a versatilist it is a guarantee for remaining relevant. Otherwise, you are subject to the demands of the market and other people’s limited understanding of the field when it comes to recruiting. Also, being a versatilist in data science allows you to have a better understanding of the bigger picture and liaise with all sorts of professionals, within and outside the data science spectrum.

Data Science Research

If you are so inclined (especially if you already have a PhD), you may want to apply your research skills to data science. It might be easier than you think, considering that in most parts of data science, the methods used are fairly simple (especially the statistical models). Still, in the one area where some truly sophisticated models exist (AI), there is plenty of room for innovation. If you feel that your creativity is on par with your technical expertise, you may want to explore new methods of data modeling and perhaps data engineering too. At the very least, you will become more intimately familiar with the algorithms of data analytics and the essence of data science, namely the signal and the noise in the data.

If you find that your research is worthwhile, even if it is not groundbreaking, you can share it with the rest of the community as a package (Julia is always in need for such packages and it is an easy language to prototype in). Alternatively, you can write a white paper on it (to share with a selected few) and explore ways to commercialize it. Who knows? Maybe you can get a start-up going based on your work. At the very least, you will get more exposure to the dynamics of the field itself and gain a better understanding of the trends and how the field evolves.

The Need to Educate Oneself Continuously

No matter how extensive and thorough your education in data science is, there is always a need to continue to educate yourself if you want to remain relevant. This is easier than people think, since once you have the basics down and have assimilated the core concepts through practice and correcting mistakes. Perhaps a MOOC would be sufficient for some people, while scientific articles would be sufficient for others. In any case, you must not remain complacent, since the field is unforgiving to those who think they have mastered it.

Education in the field can come in various forms that go beyond the more formal channels (MOOCs and other courses). Although being focused on a specific medium for learning can be beneficial, it is often more practical to combine several mediums. For example, you might read articles about a new machine learning method, watch a video on a methodology or technique (preferably one of my videos on Safari Books!), read a good book on the subject, and participate in a data science competition.

Collaborative Projects

Collaborative projects are essential when it comes to remaining relevant in data science. This is not just because they can help you expand your expertise and perspective, something invaluable toward the beginning of your career, but they can also help you challenge yourself and discover new approaches to solving data science problems. When you are on your own, you may come up with some good ideas, but with no one to challenge them or offer alternatives, there is a danger of becoming somewhat complacent or self-assured, two challenging obstacles in any data scientist’s professional development.

Collaborative projects may be commonplace when working for an organization, but sometimes it is necessary to go beyond that. That’s what data science competitions and offshore projects are about. Although many of these competitions offer a skewed view of data science (as the data they have is often heavily processed), the challenges and benefits of working with other people remain. This is accentuated when there is no manager in place and all sense of order has to come from the team itself.

These kinds of working endeavors are particularly useful when the team is not close physically. Co-working is becoming more and more an online process rather than an in-person one, with collaborative systems like Slack and Github becoming more commonplace than ever. After all, most data science roles do not require someone to be in a particular location all the time in order to accomplish their tasks. Doing data science remotely is not always an easy task, but if the appropriate systems are in place (e.g. VPNs and a cloud infrastructure), it is not only possible, but preferable.

Collaborative projects can also expose you to data that you may not encounter in your everyday work. This data may require a special approach that you are not aware of (possibly something new). If you are serious about your role in these projects, you are bound to learn through this process, as you will be forced to go beyond your comfort zone and expand your know-how.

Mentoring

Mentoring is when someone knowledgeable and adept in a field shares his experience and advice with other people who are newer to the field. Although mentoring can be a formal endeavor, it can also be circumstantial, depending on the commitment of the people involved. Also, even though it is not compulsory, mentoring is strongly recommended, especially among new people in the field.

Unlike other more formal educational processes, mentoring is based on a one-on-one professional relationship, usually without any money involved between the mentor and the mentee (protégé). For the former, it is a way of giving back to the data science community, while for the latter, it is a way to learn from more established data scientists. Although mentoring is not a substitute for a data science course, it can offer you substantial information about matters that are not always covered in a course, such as ways to tackle problems that arise and strategic advice for your data science career.

Mentoring requires a great deal of commitment. This is not just to the professional relationship itself, but also to the data science field. It is easy to lose interest or become disheartened, especially if you are new to it, and even more so if you are struggling. Although a mentor can help you in that, he is not going to fight your battles for you. Much like mentors in other professions, a data science mentor is like a guide rather than a tutor.

Even if it is for a short period of time, mentoring is definitely helpful, especially if you are interested in going deeper into the inner workings of data science. Also, it can be of benefit to you regardless of your level, not just to newcomers. What’s more, even if you are on the giving end of the mentoring relationship, you still have a lot to learn, especially on continuously improving your communication skills. If you have the chance to incorporate a mentoring dynamic in your data science life, it is definitely worth your time and can help you remain relevant (especially if you are a mentor).

Summary

Being aware of the trends of data science and having strategies in place about remaining relevant in the field enables you to remain an asset and make the most of your data science career.

Future trends in data science include:

  • The role of AI becoming more paramount in data science, perhaps even the predominant paradigm
  • Big Data getting bigger and more quantitative due to new technologies such as Internet of Things (IoT)
  • New programming paradigms, such as functional programming, becoming more commonplace
  • Hadoop alternatives, such as Spark, becoming the norm for handling Big Data
  • Other trends, such as GPUs, TPUs, and other hardware, coming to the forefront of data science technology, as well as new DL systems, such as MXnet

AI is bound to evolve in the years to come, so it is advisable to remain up-to-date about it and perhaps even investigate ways of your own to develop it via a post-grad program.

Some of the ways to remain relevant in data science are:

  • Cultivating the versatilist mindset, enabling you to undertake all sorts of data science roles across the pipeline
  • Being aware of the limitations of existing data science methods and advancing them through research
  • Developing a habit of educating yourself throughout your career, through a variety of educational mediums
  • Participating in collaborative projects
  • Mentoring, whether you are the mentor or the mentee
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset