Chapter 19
IN THIS CHAPTER
Modeling environmental-human interaction
Applying statistical modeling to natural resources in the raw
Predicting for a location-dependent environmental phenomenon
Because data science can be used to successfully reverse-engineer business growth and increase revenues, many of its more noble applications often slide by, completely unnoticed. Environmental data science, one such application, is the use of data science techniques, methodologies, and technologies to address or solve problems that are related to the environment. This particular data science falls into three main categories — environmental intelligence, natural resource modeling, and spatial statistics — to predict environmental variation. In this chapter, I discuss each type of environmental data science and how it’s being used to make a positive impact on human health, safety, and the environment.
The purpose of environmental intelligence (EI) is to convert raw data into insights that can be used for data-informed decision making about matters that pertain to environmental-human interactions. EI solutions are designed to support the decision making of community leaders, humanitarian response decision-makers, public health advisors, environmental engineers, policy makers, and more. If you want to collect and analyze environmentally relevant data in order to produce content that’s crucial for your decision making process — like real-time maps, interactive data visualizations, and tabular data reports — look into an EI solution.
In the following four sections, I discuss the type of problems being solved by using EI technologies and specify which organizations are out there using EI to make a difference. I explain the ways in which EI is similar to business intelligence (BI) and the reasons it qualifies as applied data science despite those similarities. I wrap up this main section with a real-world example of how EI is being used to make a positive impact.
EI technologies are used to monitor and report on interactions between humans and the natural environment. This information provides decision-makers and stakeholders with real-time intelligence about on-the-ground happenings, in the hope of avoiding preventable disasters and community hardships through proactive, data-informed decision making. EI technologies are being used to achieve the following types of results:
Although environmental intelligence (EI) and business intelligence (BI) technologies have a lot in common, EI is still considered applied data science. Before I delve into the reasons for this difference, first consider the following ways in which EI and BI are similar:
Much of the EI approach was borrowed from the BI discipline. However, EI evolved away from BI technologies when its features were upgraded and expanded to solve real-world environmental problems. When you look into the data science features that are central to most solutions, the evolution of EI away from standard BI becomes increasingly obvious. Here are a few data science processes you won’t find used in BI but will find in EI technology:
Data sources: Unlike BI, EI solutions are built almost solely from external data sources. These sources often include data autofeeds derived from image, social media, and SMS sources. Other external data comes in the form of satellite data, scraped website data, or
documents that need to be converted via custom optical text-recognition scripts. In EI, the reported data is almost always updating in real-time..pdf
Web-scraping is a process that involves setting up automated programs to scour and extract the data you need straight from the Internet. The data you generate from this type of process is commonly called scraped data.
Because EI is a social-good application of data science, there aren’t a ton of funding sources out there, which is probably the chief reason not many people are working in this line of data science. EI is small, but some folks in dedicated organizations have found a way to earn a living by creating EI solutions that serve the public good. In the following list, I name a few of those organizations, as well as the umbrella organizations that fund them. If your goal is to use EI technologies to build products that support decision making for the betterment of environmental health and safety, one of these organizations will likely be willing to help you with advice or even support services:
www.datakind.org
): A nonprofit organization of data science volunteers who donate their time and skills to work together in the service of humanity, DataKind was started by the data science legend Jake Porway. The organization has donated EI support to projects in developing nations and first-world countries alike. DataKind’s sponsors include National Geographic, IBM, and Pop! Tech.www.elva.org
): A nongovernmental organization, Elva was built by a small, independent group of international digital humanitarians — knowledge workers who use data and disruptive technologies to build solutions for international humanitarian problems. Elva founders gave their time and skills to build a mobile-phone platform, which allows marginalized communities to map local needs and to work with decision-makers to develop effective joint-response plans. Elva offers EI support for environmental projects that are centered in underserved, developing nations. Elva is directed by Jonne Catshoek and is sponsored by UNDP, USAID, and Eurasia Partnership.www.vizzuality.com
): Here’s a business started by the founders of CartoDB — a technology that’s discussed further in Chapter 11. Almost all of Vizzuality’s projects involve using EI to serve the betterment of the environment. Vizzuality was founded by Javier de la Torre, and some of the organization’s bigger clients have included Google, UNEP, NASA, the University of Oxford, and Yale University.www.qcri.com
): The Qatar Computing Research Institute (QCRI) is a national organization that’s owned and funded by a private, nonprofit, community development foundation in Qatar. The social-innovation section delivers some ongoing environmental projects, including Artificial Intelligence in Disaster Response (AIDR) and a crowdsourced verification-for-disaster-response platform (Verily).Elva is a shining example of how environmental intelligence technologies can be used to make a positive impact. This free, open-source platform facilitates cause mapping and data visualization reporting for election monitoring, human rights violations, environmental degradation, and disaster risk in developing nations.
In one of its more recent projects, Elva has been working with Internews, an international nonprofit devoted to fostering independent media and access to information in an effort to map crisis-level environmental issues in one of the most impoverished, underdeveloped nations of the world, the Central African Republic. As part of these efforts, local human rights reporters and humanitarian organizations are using Elva to monitor, map, and report information derived from environmental data on natural disasters, infrastructure, water, sanitation, hygiene, and human health. The purpose of Elva’s involvement on this project is to facilitate real-time humanitarian-data analysis and visualization to support the decision making of international humanitarian-relief experts and community leaders.
With respect to data science technologies and methodologies, Elva implements
You can use data science to model natural resources in their raw form. This type of environmental data science generally involves some advanced statistical modeling to better understand natural resources. You model the resources in the raw — water, air, and land conditions as they occur in nature — to better understand the natural environment’s organic effects on human life.
In the following sections, I explain a bit about the type of natural-resource issues that most readily lend themselves to exploration via environmental data science. Then I offer a brief overview about which data science methods are particularly relevant to environmental resource modeling. Lastly, I present a case in which environmental data science has been used to better understand the natural environment.
Environmental data science can model natural resources in the raw so that you can better understand environmental processes in order to comprehend how those processes affect life on Earth. After environmental processes are clearly understood, then and only then can environmental engineers step in to design systems to solve problems that these natural processes may be creating. The following list describes the types of natural-resource issues that environmental data science can model and predict:
If your goal is to build a predictive model that you can use to help you better understand natural environmental processes, you can use natural resource modeling to help you. Don’t expect natural-resource modeling to be easy, though. The statistics that go into these types of models can be incredibly complex.
Because environmental processes and systems involve many different interdependent variables, most natural-resource modeling requires the use of incredibly complex statistical algorithms. The following list shows a few elements of data science that are commonly deployed in natural-resource modeling:
The work of Columbia Water Center’s director, Dr. Upmanu Lall, provides a world-class example of using environmental data science to solve incredibly complex water resource problems. (For an overview of the Columbia Water Center’s work, see http://water.columbia.edu/
.) Dr. Lall uses advanced statistics, math, coding, and a staggering subject-matter expertise in environmental engineering to uncover complex, interdependent relationships between global water-resource characteristics, national gross domestic products (GDPs), poverty, and national energy consumption rates.
In one of Dr. Lall’s recent projects, he found that in countries with high rainfall variability — countries that experience extreme droughts followed by massive flooding — the instability results in a lack of stable water resources for agricultural development, more runoff and erosion, and overall decreases in that nation’s GDP. The inverse is also true, where countries that have stable, moderate rainfall rates have a better water resource supply for agricultural development, better environmental conditions overall, and higher average GDPs. So, using environmental data science, Dr. Lall has been able to draw strong correlations between a nation’s rainfall trends and its poverty rates.
With respect to data science technologies and methodologies, Dr. Lall implements these tools:
By their very nature, environmental variables are location-dependent: They change with changes in geospatial location. The purpose of modeling environmental variables with spatial statistics is to enable accurate spatial predictions so that you can use those predictions to solve problems related to the environment.
Spatial statistics is distinguished from natural-resource modeling because it focuses on predicting how changes in space affect environmental phenomenon. Naturally, the time variable is considered as well, but spatial statistics is all about using statistics to model the inner workings of spatial phenomenon. The difference is in the manner of approach.
In the following three sections, I discuss the types of issues you can address with spatial statistical models and the data science that goes into this type of solution. You can read about a case in which spatial statistics has been used to correlate natural concentrations of arsenic in well water with incidence of cancer.
You can use spatial statistics to model environmental variables across space and time so that you can predict changes in environmental variables across space. The following list describes the types of environmental issues that you can model and predict using spatial statistical modeling:
If your goal is to build a model that you can use to predict how change in space will affect environmental variables, you can use spatial statistics to help you do this. In the next section, I quickly overview the basics that are involved in spatial statistics.
Because spatial statistics involves modeling the x-, y-, z-parameters that comprise spatial datasets, the statistics involved can get rather interesting and unusual. Spatial statistics is, more or less, a marriage of GIS spatial analysis and advanced predictive analytics. The following list describes a few data science processes that are commonly deployed when using statistics to build predictive spatial models:
A great example of using spatial statistics to generate predictions for location-dependent environmental variables can be seen in the recent work of Dr. Pierre Goovaerts. Dr. Goovaerts uses advanced statistics, coding, and his authoritative subject-matter expertise in agricultural engineering, soil science, and epidemiology to uncover correlations between spatial disease patterns, mortality, environmental toxin exposure, and sociodemographics.
In one of Dr. Goovaerts recent projects, he used spatial statistics to model and analyze data on groundwater arsenic concentrations, location, geologic properties, weather patterns, topography, and land cover. Through his recent environmental data science studies, he discovered that the incidence of bladder, breast, and prostate cancers is spatially correlated to long-term arsenic exposure.
With respect to data science technologies and methodologies, Dr. Goovaerts commonly implements the following:
To find out more about Dr. Goovaerts’s work, check out his website at https://sites.google.com/site/goovaertspierre
.