© Jason Lee Hodges 2019
J. L. HodgesSoftware Engineering from Scratchhttps://doi.org/10.1007/978-1-4842-5206-2_15

15. Further Study

Jason Lee Hodges1 
(1)
Draper, UT, USA
 

You should be filled with great pride to have reached the culminating chapter of this book. However, you should also recognize that in doing so, you have not reached the end of your journey into the ever-expanding universe of computer science, but rather simply the end of the beginning. By reading this book, you have been introduced to many of the fundamental ideas and theories surrounding computer science and software engineering. However, it is imperative to understand that, though the core concepts have not changed in decades, software engineering is a rapidly evolving industry. New languages, frameworks, and conglomerations of paradigms and tools are being continually synthesized all the time. To be a successful software engineer, you must strive to keep up with these changes. Some of the most productive engineers are successful because they are chronic autodidacts or self-learners. In this final chapter, you will be introduced to several areas of specialty in software engineering that you can explore further on your own to deepen your understanding.

Database Administration

While you have been introduced to common data structures in which to store data during the execution of your software, you have not yet been exposed to where data is housed when your program is not running. In our model operating system, we briefly covered the idea that software can create text files and store them within a file system; however, most software application data is stored in databases.

A database is an organized collection of related data in a single system. Databases typically contain several tables which house rows of data with a consistent schema. A schema is a definition of the label, value, and type of data contained in each cell in a table in a database. For example, if you need to keep track of different users that have access to your software, you might house that data in a table that contains information about the user. That table will have several columns of information like the user’s name, their email address, the number of times they have logged in, and the date they last logged in. The schema of that table would be a column named username which is of type string, a column named email which is also of type string, a column named logins which is of type integer, and a column named lastLogin which is a date type. Each row in this table would provide information matching this schema for every user that has access to your system.

Databases can contain several tables, that usually relate to each other in some way, that can be joined together via a query language. This query language is known as Structured Query Language or SQL. Database systems that allow for tables to be joined together in a related way are known as Relational Database Management Systems or RDBMS. Nearly all programming languages have database driver modules that can integrate directly with these database systems. This enables you to store your application data in an organized and efficient manner rather than writing them to disparate text files stored on a file system that are difficult and inefficient to search through and analyze. Common relational databases include MySQL, Oracle, and Postgres.

There are also databases that are not considered relational and do not store their data in tables. These are often referred to as non-relational databases or NoSQL databases. These databases organize data into collections rather than tables and contain unstructured documents of data rather than a schema-defined row. While this type of organization may be arguably harder to query and has no relational structure, it lauds the advantage of scalability and heavy access efficiency. It also is an extremely flexible paradigm as you don’t need to plan and organize a schema for your data ahead of time; rather, each document can contain any label, value, and type that it needs to contain. Common non-relational databases include MongoDB and CouchDB.

Deciding whether you should use a relational database or a non-relational database is typically the decision of a database administrator. A database administrator is someone who specializes in databases and is tasked with architecting, maintaining, scaling, and querying databases on behalf of an organization. While this role typically does not involve writing any software, many companies rely heavily on database administrators to ensure that their software can query databases efficiently and reliably to power their application logic.

Data Engineering

Oftentimes, once software systems reach a certain critical scale, the data required to run an application, provide interactions between applications, or enable reporting for applications winds up being distributed among many data sources. In order to effectively use this data, someone must be well versed in the interaction of data between applications and how to effectively and efficiently transform data to match several schema or reporting requirements. A data engineer is a software engineer who specializes in accessing and transforming data to set up scalable and reliable data pipelines between applications.

A common theme in software engineering as of late is the idiom of “big data.” Data engineers are skilled in sifting through this big data and formulating efficient structures and schedules to use this data to provide business value. For example, let’s assume you are a data engineer tasked with understanding where to invest programmatic marketing spend for your company. Your directors don’t want decisions surrounding where to invest their marketing dollars to be based on emotions or gut feelings, but rather well-informed data that can be parsed into a direct formula on how much money to allocate to each marketing option. Your marketing options include Google AdWords, Facebook Ads, Twitter Ads, and DoubleClick Ads, all of which are digital advertising platforms. You likely want to invest some of your ad spend in all of these platforms, but what is the most effective distribution? As a data engineer, you will need to programmatically connect to all of these systems, create new advertising campaigns, and continually monitor their results for adjustment.

When connecting to these different advertising systems, you will likely receive performance information about your campaigns in entirely different schemas for each system. How, then, will you objectively compare the results of campaigns across different systems? The job of the data engineer is to build reliable and consistently up-to-date data pipelines that can be transformed into an apples-to-apples comparison, regardless of the data schemas provided by the source system. However, oftentimes the data engineer is not charged with garnering insights from the data once it has been connected, collected, and normalized. In many organizations, that is the job of the data scientists.

Many of the common tools used by data engineers include Spark, Hadoop, and Airflow. Spark is a big data tool used for data transformation across a distributed network of server nodes that can scale to efficiently match the needs of the data transformation process depending on the size of the data. Spark has several language options for data engineers to choose from, but it was originally implemented using Scala. It is thus considered to contain several advantages when using Scala over other language implementations. Hadoop is also a distributed data transformation system that predates Spark. It is best known for its map reduce strategy which is closely related to the higher-order functions of map and reduce that were introduced in the functional programming paradigm section of this book. Airflow is a data pipeline scheduling platform that allows you to trigger data connection and transformation jobs in tandem. It also has built-in tools to allow for retrying jobs that have failed and reporting the status of all of the jobs in the data pipeline at any given time.

Data Science

Once data has been made available from the source application in a data pipeline, data scientists are often charged with obtaining insights from the data to drive business decisions. Oftentimes the hope is that these insights can include predictions about the future based on the data that has been gathered in the past. Recently, these predictive insights are gathered using machine learning and artificial intelligence algorithms that have become the main focus of data scientists’ research. These algorithms rely heavily on proven statistical modeling techniques that have been around for decades. Unfortunately, these techniques could not be harnessed fully until the computing power was capable of handling the big data that is currently being produced in the industry.

In some specific situations, data scientists are not simply extracting insights from data but using the data to make programmatic decisions. For example, in the field of computer vision, data scientists use the data gathered from cameras to make decisions about how a robot should respond to a particular situation or whether a self-driving car should slow down or stop. These techniques are also used in the same manner to create facial recognition software. Recently, advances in data science have made it reasonably simple for computers to take audio data and transform that data into recognizable text. This can then be fed to software as input instructions. All of these amazing advancements in the field of computer science have been made possible by data science.

Data science is a very popular and rapidly growing field of computer science. To correctly harness it, understanding the concepts of database administration and data engineering are crucial along with specialized training in statistics and machine learning. The demand for great data scientist will only continue to grow in the near future as these skills become increasingly more valuable.

Embedded Systems

In conjunction with data science, another very popular topic in the software engineering industry currently is the “Internet of Things.” The Internet of Things is a phrase used to describe the ever-increasing availability of connected smart devices. These might include devices such as a smart TV, a smart fridge, or home automation systems. All of these are made possible by embedded systems engineers.

An embedded system is a program that is developed and run directly on specific piece of hardware. Often this hardware is a micro-controller that is used to power specific functionality in a device. Examples of embedded systems include the software used to enable a printer to access information from a computer and print it to paper, the system embedded in a mouse that allows it to translate movement of a physical device into coordinates of a cursor on a monitor, or the monitor itself interpreting information from the computer and displaying it graphically. However, embedded systems are not limited to computer peripherals. Embedded systems are all around us. Behind every airplane, automobile, traffic light, drone, and robot is an embedded system.

Because embedded systems operate on small micro-controllers, they typically require software that is extremely efficient and extremely small in size. Because of this, many embedded systems are written in low-level software languages. These languages may include C, C++, or Rust. In many cases, embedded systems might even be written directly in assembly code depending on the constraints of the system.

If embedded systems interest you, a great place to start learning is by buying an inexpensive Raspberry Pi or Arduino board. A Raspberry Pi is a small computer that you can buy online that typically contains only a credit card size motherboard with common peripheral ports available to hookup monitors, keyboards, or a mouse. With this small motherboard, you can add on any electrical device you wish such as infrared sensors, microphones, or actuators. Using these basic add-ons, you can plug the Raspberry Pi into a monitor, load up its onboard operating system (usually a Linux variant called Raspbian), and write simple programs that interact directly with these hardware components. Similar components and do-it-yourself kits are available for Arduino chips; however, they do require a bit more knowledge of embedded systems and C++.

Distributed Systems

A distributed system can be defined as a network of computers working together as one system. To the user of this system, they are not aware that several computers are working together to power their requests. Said another way, the distribution of software across several node computers in the system is abstracted away from the user. Typically, distributed systems are used when software reaches a certain scale and cannot be reliably powered using a single computer.

Many of the topics surrounding software engineering in this book had to do with scalability and reliability. Distributed systems are usually deployed to accomplish these goals for our software. When a database reaches a certain size or requires the capacity for a very large number of concurrent queries, we often will replicate the database across computers to appease these requirements. The user querying the database will not be aware that the database has been replicated and distributed among many computing nodes, but their query performance will increase nonetheless.

The same is true for web sites on the Internet. Early web sites were powered off of single bare metal servers in the garage of start-up entrepreneurs. As their web sites grew, and increasingly more users would attempt to connect to their web site at the same time, a single server no longer could handle the network traffic. In this scenario, a user attempting to connect to an overwhelmed server might have to wait several minutes before the web site would respond or the web site would crash and not respond at all. To remedy this problem, a distributed systems expert might be called in to set up several servers for the web site. The software on the original server would be copied over to the additional servers, and a process would need to be put in place to keep the versions of the web site software in sync on each server. Additionally, extra software would need to be put in place that distributes the incoming user request among the servers to ensure that no individual server gets overwhelmed by requests. This process of distributing traffic is known as load balancing. Another problem that a distributed systems engineer would be faced with includes what to do when one of the computers in the distributed system crashes or fails to process an incoming request successfully. In this case, the distributed systems engineer might want to remove the failing node from the system and proxy the user’s request to another node until the failing node can successfully be brought back online. This is what is known as fault tolerance in a distributed system.

These days, much of the work surrounding distributed systems for web sites is abstracted away by hosting providers that supply what is known as “Infrastructure as a Service” or IaaS. An IaaS provider, such as Amazon AWS or Google Cloud Engine, has several server farms all over the world that they can assign to customers who wish to host their web site software on the provider’s hardware. The IaaS provider then provides a vast amount of configuration options available to the distributed systems engineer tasked with deploying the web site software onto the provider’s servers. They can configure the load balancing settings, the security policies, the number of servers they wish to include in the distributed system, and many other options. Teams of distributed systems engineers who work with IaaS providers are typically referred to as “Cloud Operations” or CloudOps.

Much of the theory behind distributed systems requires deep knowledge of data structures and graph theories. If distributed systems is a topic that you are interested in, I recommend reading up on graph theory and understanding many of the algorithms related to this topic. Some of the algorithms you may want to consider include graph traversals and shortest path algorithms.

Web Development

Almost every piece of software written today interacts with the Internet in some way. The software written specifically for a web site or a native mobile application that is hosted on a server in the cloud is written by web developers. Typically, the work required to create a web site is divided into two sub-groups: front-end engineers and back-end engineers. Engineers who have expertise in both front-end and back-end technologies are known as full stack engineers.

A front-end engineer’s primary responsibilities include writing the software that provides an interface for interacting directly with the user. This might include creating a navigation menu on a web site, a transition animation between pages, or interactive content on the body of the page. The technologies used on the front-end, at a very basic level, include HTML, CSS, and JavaScript. HTML, or hypertext markup language, is an XML notation variant that provides the structure of where elements of a web page belong. CSS, or cascading style sheets, provide information to the browser about how these elements should look (what color, how big, what spacing should exist, etc.). After these two structures have been defined for a web page, JavaScript (or ECMAScript as it is officially called) is written to create functionality on a web site. There are libraries in many languages, including Scala, that allow you to write web applications in your language of choice that compile down to JavaScript. The JavaScript for a web site is where the real software development occurs in front-end technologies.

A myriad of tools and frameworks have popped up to assist in front-end development, and understanding them and keeping up with them is key to being successful in this role. Common JavaScript frameworks include Angular, React, and Vue.js. Common CSS tool sets include pre-processors like Less and Sass. There are also tools such as Webpack, Gulp, and Grunt used for packaging the contents of your front-end code and minimizing them into smaller, more efficient code. If front-end engineering is of interest, you should also consider learning ECMAScript 6, or ES6, which is the latest syntax updates to the JavaScript language.

Back-end engineers typically handle any software application logic that needs to be realized across an entire web site or across both a web site and a mobile application. This usually includes APIs (application programming interfaces) that can be accessed using the HTTP protocol that your browser uses to load responses from web servers. These APIs might be responsible for tasks such as authentication, interacting with a database, or interacting with or extending the functionality of other web sites using their APIs. Back-end engineers can use just about any programming language they choose. Web frameworks exist for almost all languages to speed up the development of web-based software. For Scala, the two main web frameworks are Spring and Play.

Conclusion

As you can see, there are several unique and specific applications for software engineering skills in the industry. Deciding where to apply your engineering talent can be challenging, especially if your interest spans multiple topics. However, it is almost impossible to become an expert at everything. I would recommend picking a specific field of study and diving into it completely, becoming absolutely saturated by everything there is to know about it, before moving on to the next topic. This will prevent you from becoming the jack of all trades with no mastery in any field. Conversely, some of this might be outside of your control depending on whether you happen to get a job in a certain field or if there is no expertise in a given field and you are expected to fill the gap.

At the very least, hopefully by reading this book you have developed a base level of understanding of everything there is to explore in the world of software engineering. When you start to dive into other languages, you will notice that they all contain the same basic concepts of expressions, data types, control flow, functions, and classes. Also, the data structures and algorithms in this book should be applicable in any language, although the implementation details might vary depending on the primary paradigm of the language. By being exposed to all of the paradigms in one language, you have a unique opportunity to decide which you identify with the most and which you want to explore further in other languages.

I encourage you to continue learning and coding on a daily basis. It’s important to seek out mentorship and a community of other engineers that you can rely on for support and motivation. If you want to be a successful engineer, you can never stop learning. The subject of software engineering is so vast that no one person can ever know it all. There is so much content out there that I expect you to be a student for life.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset