If the Earth were sick, nobody would be healthy.
Human footprints on the nature significantly deteriorate the environmental quality of air, water, and soil, pose a great threat to human health and reduce human life expectancy. Taking air pollution as an example, the primary and secondary ambient air standards of particulate matter of diameter 2.5 µm (PM2.5) in the United States are 12 and 15 µg/m3, respectively (US EPA, 2016). The corresponding standards of PM2.5 in China are set at 15 and 30 µg/m3, respectively (China MEP, 2016). However, the average concentration of PM2.5 for 10 provinces in China was greater than 55 (µg/m3) in 2015. According to the data from the Administration of Energy of China, 64% of primary energy in China is from coal. China with 1.4 billion population burns about 47% of the world’s total coal consumption. The annual average PM2.5 for the top 10 most polluted provinces in China is at least twice as much as the secondary standard of 35 µg/m3 as shown in Figure 2.1.
In China, mortality rates due to environmental damage are significantly greater than natural mortality rates. Lung, liver, and stomach cancers are the top three mortality causes in China. Epidemic studies indicate that lung cancer is associated with chemical smog due to coal burning, liver cancer with drinking water and strong liquor, and stomach cancers with contaminated food due to soil pollution. About 90% of the 161 cities whose air quality was monitored in 2014 were below Chinese official standards according to the National Bureau of Statistics of China (NBSC). More than half of China's surface water is so polluted that it cannot be used as drinking water resources. As a result, about 4000 people died every day in China due to coal burning as the likely principal cause. Due to its severity, air pollution is the leading cause of lung cancer and other respiratory infections. Figure 2.2 shows the percentage of leading causes of death in the world. In China, the most common causes of death are (i) lung cancer with 1.59 million deaths, (ii) liver cancer with 745 000 deaths, (iii) stomach cancer with 723 000 deaths, (iv) colorectal cancer with 694 000 deaths, (v) breast cancer with 521 000 deaths, and (vi) esophageal cancer with 400 000 deaths.
Worldwide, environmental pollution is also the leading cause of cancer and contributes to major premature mortality. WHO (2014) estimated that there were 3.1, 1.6, and 1.5 million people who died from lower respiratory infections, trachea bronchus lung cancers, and diarrhea diseases in the world as shown in Figures 2.2 and 2.3. There were approximately 14 million new cases, and the number of new cases is expected to rise by about 70% over the next two decades (WHO, 2014). In 2012, ischemic heart disease, stroke, and chronic obstructive pulmonary disease caused 7.4, 6.7, and 3.1 million deaths, respectively (NBSC, 2012).
Figure 2.4 shows that the leading causes of mortality due to malignant tumor is 168 and 159 deaths per 100 000 population in urban and rural China, respectively, which is about 28 and 25% of the leading causes of death in urban and rural China.
Further evidence of the environmental health issue can be seen from Figures 2.5 and 2.6. Zhao et al. (2010) showed that lung, liver, stomach, esophageal, and colorectal cancer were 32, 19, 18, 9, and 8% in urban China, while these data were 24, 23, 23, 15, and 5% in rural China, respectively, as shown in Figure 2.5.
To monitor the progress of protecting human health, the United Nation uses an improved sanitation facility as a metric. It is defined as piped sewerage, septic tanks, and pit latrines with slabs or composting toilets not shared with other households. From 1990 to 2015, the coverage of improved sanitation at these facilities rose from 54% to around 68% globally as shown in Figure 2.7. This number, however, missed the Millennium Development Goal (MDG) target by 9%. In 2015, about 946 million people were still practicing open defecation worldwide (WHO, 2016).
To achieve the Sustainable Development Goals (SDG) target, cities all over the world need to reduce open defecation, promote handwashing, and improve management and treatment of faucal wastes from both collected sewer and on‐site facilities. For drinking water, the UN uses pipe water on premise and public standpipes, boreholes, protected wells, springs, and rainwater as indicators for improved drinking water. In 2015, there were 6.6 billion people using improved drinking water, while 0.7 billion still used an unimproved drinking water source or surface water (WHO, 2016), as shown in Figure 2.8.
About one‐quarter of improved sources are contaminated with feces and approximately 1.8 billion people drink water containing such contamination (WHO, 2016). Therefore, only 68% in urban areas and 20% in rural areas have truly safe drinking water. Unsafe drinking water causes liver cancer due to disinfection by‐products (DBPs) and other contaminants such as antibiotics in drinking water.
The US EPA establishes environmental standards using health risk assessment (HRA) according to carcinogenic or noncarcinogenic chemicals. For chemicals that are known or expected to cause adverse health effects, the EPA established an enforceable maximum contaminant level (MCL) or nonenforceable maximum contaminant level goal (MCLG). The Safe Drinking Water Act (SDWA) proclaims standards and health advisories (HAs) for DBPs. HRA quantifies factors such as adsorption during ingestion, pharmacokinetics, mutagenicity, reproductive and developmental effects, and carcinogenicity (Pontius, 1990). The MCLs are federally enforceable limits for contaminants in drinking water established as the national primary drinking water regulations (NPDWRs). The secondary MCLs are established under the SDWA to protect public welfare such as odor, taste, and appearance. The HAs for drinking water contaminants are levels considered to be without appreciable health risk for specific durations of exposure and are not legally enforceable. Similarly, the MCLGs are nonenforceable health goals, which are to be set at levels at which no known or anticipated adverse effects on the health of persons occur with an adequate margin of safety. Table 2.1 is the description of the SDWA standards and HA categories.
Table 2.1 SDWA standards and health advisories.
Standards | |
MCL (maximum contaminant level) | The enforceable concentration that is provided to public water system users |
MCLG (maximum contaminant level goal) | The nonenforceable concentration that protects humans from adverse effects |
Health advisories | |
RfD (reference dose) | An estimation of daily human exposure without appreciable risk to adverse effects over a lifetime |
DWEL (drinking water equivalent level) | The estimation of a lifetime exposure that protects humans from adverse noncancerous health effects, assuming the sole exposure source is from drinking water |
One‐day exposure | The drinking water concentration that is not expected to cause adverse noncarcinogenic effects if exposure continues for five consecutive days |
10‐Day exposure | The drinking water concentration that is not expected to cause adverse noncarcinogenic effects if exposure continues for 14 consecutive days |
Long‐term exposure | The drinking water concentration that is not expected to cause noncarcinogenic effects if exposure continues for 10% of the person’s lifetime |
Lifetime HA | The drinking water concentration that is not expected to cause adverse noncarcinogenic effects if exposure continues for a lifetime |
To assess an individual’s risk, bioassays are converted to estimate human risk based on human exposure. The dose–response curves from animal tests were used to determine the equivalent human dose–response curve. The MCLG is set using a three‐category approach dependent upon the evidence of carcinogenicity. If the evidence of carcinogenicity is strong, the MCLG is set to zero. If the evidence of carcinogenicity is limited, one of two methods is utilized to calculate the MCLG depending on the toxicity data available. For every chemical compound that has an MCLG, an MCL or treatment technique must be determined. Considering the cost of remediation, the MCL is based on the best available technology (BAT) and set close to the MCLG. If BAT cannot achieve zero, MCL cannot equal to zero. When the MCL is within a risk of 10−4 to 10−6, the MCLG is set to zero.
In the past, reference dose (RfD) expressed in units of milligrams per kilogram of body weight per day (mg/kg/day) was used. The RfD is based on lifetime exposure level at which there is no significant risk to humans. It is calculated by dividing the no observed adverse effect level (NOAEL) or the lowest observed adverse effect level (LOAEL) by an uncertainty factor. The uncertainty factor accounts for differences between human and animal and differences within the human population and varies from 10 to 1000 depending upon the toxicity data available. RfD is determined using the following equation:
Since an extremely high uncertainty factor is used, major limitations of RfDs are the following: (i) it is limited to one of the doses in the study and is dependent on study design, (ii) it does not account for variability in the estimate of the dose–response, (iii) it does not account for the slope of the dose–response curve, and (iv) it cannot be applied when there is no NOAEL, except through the application of an uncertainty factor (Crump, 1984; Kimmel and Gaylor, 1988).
To overcome the high uncertainty factor of the RfD, the EPA developed benchmark dose (BMD) methods by fitting mathematical models to dose–response data and to select a BMD associated with a predetermined benchmark response (BMR), such as a 10% increase in the incidence of a particular lesion or a 10% decrease in body weight gain. Results from all models include a reiteration of the model formula and model run options chosen by the user, goodness‐of‐fit information, the BMD, and the estimate of the lower‐bound confidence limit on the BMD (BMDL). The benchmark dose software (BMDS 2.6) by the US EPA (2016) has the nested models, parameter standard error reporting, and parameter initialization for continuous models. BMDS 2.6 contains thirty different models appropriate for the analysis of dichotomous (quantal) data, continuous data, nested developmental toxicology data, multiple tumor analysis, and concentration–time data. Typical models used in the software are shown in Table 2.2.
Table 2.2 Models used in the US EPA Benchmark Dose Software 2.6 (US EPA, 2015).
Model type | Model | Abbreviation |
Continuous | Exponential | exp |
Hill | hil | |
Linear | lin | |
Polynomial | ply | |
Power | pow | |
Dichotomous | Gamma | gam |
Logistic | log | |
LogLogistic | lnl | |
LogProbit | lnp | |
Multistage | mst | |
Multistage cancer | msc | |
Probit | pro | |
Weibull | wei | |
Quantal linear | qln | |
Dichotomous hill | dhl | |
Dichotomous alternative | Gamma‐BgDose | gmb |
Logistic‐BgResponse | ||
LogProbit‐BgDose | lpb | |
Mutistage‐BgDose | msb | |
Multistage‐Cancer‐BgDose | mcb | |
Probit‐BgResponse | prb | |
Weibull‐BgDose | web | |
Nested | Nested logistic | nln |
NCTR | nct | |
Rai and van Ryzin | rvr | |
Repeated response measures | ToxicoDiffusion | txd |
Concentration × time | ten Berge | ten |
Multitumor | MS_Combo | multi |
The SDWA proclaims standards and HAs for DBPs (US EPA, 1997). However, setting the MCLs and MCLGs involves a lot of uncertainty, because discrepancies may exist for lifetime and longer‐term exposure HAs due to conservative policies. For example, the uncertainty factor may vary from 5 to 5000 when the lifetime health advisory concentration is estimated (U.S. EPA, 1996). Human HRA and environments risk assessment (ERA) are used to develop both ambient and discharge standards by the US EPA. Risk assessment is a systematic approach to characterize the nature and magnitude of the risks associated with environmental or health hazards.
Since chlorination is the major disinfection process in the United States, regulation of DBP concentration in drinking water is one of the major challenges faced by the US EPA. Major human and financial resources have been devoted to identify, monitor, assess, and regulate the human health effect of DBPs. As a result, HRA guidelines of DBPs were developed to protect the public from both biological and chemical risks. For example, the US EPA developed the Stage 2 DBPR aiming to reduce peak DBP concentrations in the distribution system. When a water treatment plant (WTP) assesses its disinfection strategy, both the disinfectant effectiveness against the target pathogen and the DBPs formed as a result of the disinfectant must be considered in the decision‐making process. Since no toxicity test could be performed on humans, animal test data are used with quantified uncertainty and variation involved. During HRA of DBP, information for accurate evaluation of DBP risk may not be complete, and the uncertainty factor in the assessment may be quite large (Cothern et al., 1986). For example, for total trihalomethanes (TTHM) and five haloacetic acids (HAA5), the drinking water standards were set at 80 and 60 µg/l as locational running annual average (LRAA) by the EPA (2006), respectively. If there was no specific toxicity information available, the MCLG was set up based upon a quantitative structure–activity relationship (QSAR) study. For example, the MCLG of 1,1‐dichloroethylene and cis‐1,2‐dichloroethylene were developed using QSAR approach.
In addition to pathogenic bacteria and viruses, Cryptosporidium is one of the new public health concerns for the EPA. The EPA established the Long Term 2 Enhanced Surface Water Treatment Rule (LT2ESWTR) based on Cryptosporidium concentrations in source water and current treatment practices. Four bins were recommended corresponding to additional treatment requirements for filtered WTPs as shown in Table 2.3. WTPs with average Cryptosporidium concentrations less than 0.075 oocysts per liter (oocysts/l) are placed in bin 1 where no additional treatment is required. For concentrations of 0.075 oocysts/l or more, treatment beyond the existing processes is required. The additional treatment required for each bin, specified in terms of log removal, depends on the type of treatment that the WTP already uses. In setting the biological standards, the term “log” means the order of magnitude reduction in concentration; e.g. 2‐log removal equals a 99% reduction, 3‐log removal equals a 99.9% reduction, and 4‐log removal equals a 99.99% reduction. Giardia and virus need 3‐log and 4‐log removal and/or inactivation, respectively.
Table 2.3 Bin requirements for filtered PWSs (US EPA, 2006).
Cryptosporidium concentration (oocysts/l) | Bin classification | And if the following filtration treatment is operating in full compliance with existing regulations, then the additional treatment requirements are: | |||
Conventional filtration treatment (includes softening) | Direct filtration | Slow sand or diatomaceous earth filtration | Alternative filtration technologies | ||
<0.075 | 1 | No additional treatment | No additional treatment | No additional treatment | No additional treatment |
≥0.075 and <1.0 | 2 | 1‐log treatment | 1.5‐log treatment | 1‐log treatment | As determined by the state |
≥1.0 and <3.0 | 3 | 2‐log treatment | 2.5‐log treatment | 2‐log treatment | As determined by the state |
≥3.0 | 4 | 2.5‐log treatment | 3‐log treatment | 2.5‐log treatment | As determined by the state |
To reduce biological and chemical risks simultaneously, UV disinfection appeared to be the best option for drinking water and treated wastewater effluent. Unlike chemical disinfectants, UV leaves no residual that can be monitored to determine UV dose and inactivation credit. To earn disinfection credits, however, a relationship between the required UV dose and these parameters must be established and then monitored at a WTP to ensure sufficient disinfection of microbial pathogens. The UV dose depends on the UV intensity (measured by UV sensors), the flow rate, and the UV transmittance (UVT). The US EPA (2006) recommended UV dose requirements (mJ/cm2) in Table 2.4.
Human health risk assessment (HRA) quantifies factors such as adsorption during ingestion, pharmacokinetics, mutagenicity, reproductive and developmental effects, and carcinogenicity (Pontius, 1990). It is to quantify the likelihood that adverse human health effects may occur or are occurring as a result of exposure to one or more stressors (US EPA, 1992). To reduce environmental health mortality, HRA is used to detect issues, identify hazards, characterize hazards, assess exposure, and manage and communicate risk. By using HRA, the EPA developed an Integrated Risk Information System (IRIS) that contains the human health effect databases of chemicals that may result from exposure to various chemicals in the environment. The IRIS 10−6 risk level is the contaminant concentration (in µg/l) in drinking water that would yield no greater than an additional risk of one in a million (10−6) after a lifetime of drinking that water. The acute 10‐day values apply specifically to acute toxic effects on children but are expected to be protective for adults. For noncarcinogenic chemicals, this value is typically the same as the MCLG. The chronic (lifetime) values for cancer are set at a level that should yield no greater than an additional 10−6 risk over a lifetime exposure. According to the IRIS, the EPA cancer risk is classified in six different categories: (i) H is carcinogenic to humans, (ii) L is likely to be carcinogenic to humans, (iii) L/N is likely to be carcinogenic above a specified dose, (iv) S is suggestive evidence of carcinogenic potential, (v) I is inadequate information to assess carcinogenic potential, and (vi) N is not likely to be carcinogenic to humans.
Hazard identification (HI) is to determine whether exposure to a stressor can cause an increase in the incidence of adverse health effect in humans, which reflects the capacity of an agent to cause adverse health effects in humans and other animals (US EPA, 1995). Qualitative description in HI is complemented with QSAR, genetic toxicity, pharmacokinetic, and the weight of evidence. For example, toxicokinetic data that deal with how the body absorbs, distributes, metabolizes, and eliminates specific chemicals are usually used in HI. In the HI of chlorination, chemical disinfectants such as chlorine and chloramines react with naturally occurring organic material (NOM) to produce DBPs that are potential carcinogens and development defects in laboratory animals. DBPs have been linked to potential health risks such as liver, kidney, or central nervous system problems and the increased risk of cancer. To quantify the toxicity of DBPs, QSAR analysis is one important method of estimating the carcinogenicity of DBP using animal toxicity tests. There are six major classes of DBPs, namely, halogenated alkane, halogenated alkene, halogenated aromatic, halogenated aldehyde, halogenated ketone, and halogenated carboxylic acid. A class of DBPs usually has the similar toxic mode or mechanisms. Table 2.5 lists these DBP classes and their definition.
Table 2.5 Classification of DBPs.
DBP class | Definition |
Trihalomethanes | Chemical compounds in which three of the four hydrogen atoms of methane (CH4) are replaced by halogen atoms |
Haloacetic acids | Carboxylic acids in which a halogen atom takes the place of a hydrogen atom in acetic acid |
Haloacetonitriles | Small organic compounds containing nitrogen, chlorine, and/or bromine. Little data is available on the toxicity of the haloacetonitriles; however, animal studies suggest that dichloroacetonitrile is mutagenic and therefore potentially carcinogenic |
Haloketone | A functional group consisting of a ketone group with an α‐halogen substituent. The general structure is RR′C(X)C(=O)R where R is an alkyl or aryl residue and X any one of the halogens |
Other halogenateds compounds | Chloropicrin, chloral hydrate, and cyanogen chloride |
Aldehydes | Organic compounds that have an acyl group, R–C=O with a hydrogen bond to the carbonyl or acyl carbon (double‐bonded carbon) |
Inorganic compound | Chlorine, chloramines, chlorine dioxide, and bromate |
In addition to the main molecular structures such as alkane, alkene, aromatic compounds, substituents such as chlorine and bromine within each DBP class contribute significantly to the toxicity of the corresponding DBPs. Table 2.6 compares the relative order of potency of haloacetonitriles in terms of different toxic mechanisms (Pontius, 1990). For example, haloalkanes are procarcinogens that are activated via reactions in which CYP450 acts as a catalyst in dehalogenation. Brominated compounds are expected to be more alkylating activity reactive than chlorinated compounds. Chlorine or bromine substitution at the α‐carbon or terminal carbon in this group is expected to be potential alkylating agents. The active halogen at both ends of the aliphatic chain is also expected to be cross‐linking agents. The stability of several chlorinated ketones in aqueous solutions follows this order: 1,3‐dichloro > pentachloro ≫ hexachloro. Among the mutagenic chloropropanols (mostly direct acting), the relative mutagenic potency follows the order 1,3‐ > 1,1,3,3‐ > penta‐ > 1,1,3‐ > 1,1,1‐ > 1,1‐, with the potency of 1,3‐ being about 100–1000 times higher than that of 1,2‐dichloroacetic acid (DCA) and trichloroacetic acid (TCA), which are known as mouse liver carcinogens and are by‐products found in drinking water. In animal tests, DCA produces developmental, reproductive, neural, and hepatic effects. Aldehydes as an electrophilic compound may form DNA–protein cross‐links and lead carcinogenesis.
Table 2.6 Comparison of chemical, biochemical, and biologic properties of haloacetonitriles.
Chemical/biochemical/biologic tests in assessing carcinogenic potential | Relative order of potency of haloacetonitriles tested |
Alkylating activity 4‐(p‐nitrobenzyl)(pyridine reaction) | Br2 ≫ BrCl > Cl ≫ Cl2 ≫ Cl3 |
Inhibition of glutathione S‐transferase | Cl3 > Br2 > Cl2 > Br > Cl |
Escherichia coli SOS chromotest | BrCl > Br2 > Cl2 ≫ Cl (inactive) |
Ames or Ames fluctuation tests | Cl2 ≈ BrCl > Cl3 ≈ Br2 > Cl ≈ Br (inactive) |
Cl3 > Cl > Cl2 > Br2 ≈ Br (inactive) | |
DNA single‐strand breaks in HeLa cell (comet assay) | Cl3 > BrCl > Br2 > Cl2 > Cl |
Br2 > Cl3 ≈ Cl2 ≈ BrCl > Cl | |
Sister chromatid exchanges in Chinese hamster ovary cells | Br2 > BrCl > Cl3 ≥ Cl2 > Cl |
Newt micronucleus assay | Br2 ≥ Cl3 > Br > Cl2 ≫ Cl |
In vivo mouse micronucleus assay | Br2 ≈ BrCl ≈ Cl3 ≈ Cl2 ≈ Cl (all inactive) |
Lung adenoma assay in strain A mice | Cl ≈ Cl3 ≈ BrCl > Br2 ≈ Cl2 (inactive) |
Skin tumor initiation in SENCAR mice | Br2 ≈ Cl ≈ BrCl > Cl3 (inconsistent) > Cl2 (inactive) |
A dose–response relationship describes how severity of adverse health effects (the responses) is related to the exposure amount to a chemical compound. The measured response usually increases as the dose increases. To establish drinking water standards, dose–response curves of animal toxicity tests are used to establish toxicity slopes or BMD and to document the dose–response relationship over the range of observed doses. With animal toxicity data, the health risk is extrapolated beyond the lower range of available observed data until the dose level begins to be adverse to human health. The shape of the dose–response relationship curve depends on the chemical, the kind of response, incidence of disease, and death. However, there are major gaps during the extrapolation from animal to human and from high dose to low dose to predict human health risk. Great uncertainty could be introduced during these extrapolations through either nonlinear or linear dose–response models for different modes of action. For example, animal tumor data is based on the development of the dose–response parameter as effect dose (ED) at 10% (ED10) or lethal dose at 10% (LD10). The actual dose heavily depends upon the exposure routes and animals. The EPA lists the general LD50 of different chemical compounds with great variations in Table 2.7. For dioxin (TCDD), the LD50 is as low as 0.001 mg/kg. For inhalation toxicity test, the LD50 for rats and mice are 293 and 137 mg/kg, respectively. Table 2.7 shows LD50 for different chemicals varied thousands of times.
Table 2.7 LD50 of typical chemicals based upon animal toxicity test (US EPA, 2006).
Chemical | LD50 (mg/kg) | Chemical | LD50 (with route and animal) |
Ethyl alcohol | 10 000 | Caffeine | 620 mg/kg – oral mouse |
Sodium chloride | 4 000 | 192 mg/kg – oral rat | |
105 mg/kg – i.v. rat | |||
68 mg/kg – i.v. mouse | |||
Ferrous sulfate | 1 500 | Chlorine (LC50) | 293 ppm/1 h – rat |
137 ppm/1 h – mouse | |||
Morphine sulfate | 900 | THC (from marijuana) | 175 mg/kg – i.v. mouse |
Strychnine sulfate | 150 | 155 mg/kg – i.v. rabbit | |
100 mg/kg – i.v. dog | |||
Nicotine | 1 | Mercury(I) chloride | 210 mg/kg – oral rat |
8 mg/kg – i.v. mouse | |||
Black widow | 0.55 | Mercury(II) chloride | 37 mg/kg – oral rat |
10 mg/kg – oral mouse | |||
Curare | 0.50 | Arsenic acid (V oxidation state) | 48 mg/kg – oral rat |
Rattlesnake | 0.24 | Arsenic trioxide (III oxidation state) | 20 mg/kg – oral rat |
Dioxin (TCDD) | 0.001 | Dimethylarsenic acid (methylated arsenic form used as a cotton defoliant) | 700 mg/kg – oral rat |
In nonlinear dose–response assessment, the threshold of toxicity is where the effects (or their precursors) begin to occur. The no observed adverse effect level (NOAEL) is the highest exposure level at which no statistically or biologically significant increases are seen in the frequency or severity of adverse effect between the exposed population and its control population. Different mathematical models were used to establish the bench mark dose (BMD) or benchmark dose lower confidence limit (BMDL) in the range from 1 to 10% depending on toxicity tests. The BMDL is a statistical lower confidence limit on the dose that produces the selected response. The lowest observed adverse effect level (LOAEL), NOAEL, or BMDL is used as the point of departure for extrapolation to lower doses. The EPA (2012) developed a guideline on using dose–response modeling to obtain BMD, i.e. dose levels corresponding to specific response levels, near the low end of the observable effect as shown in Figure 2.9. The fraction of animals affected in each group is indicated by the points with the error bars of 95% confidence intervals. In developing the BMD, the EPA requires the following statistical information of the process: (i) rationale, (ii) estimation procedure, (iii) estimates of model parameters, (iv) goodness of fit such as log‐likelihood and Akaike Information Criterion (AIC), and (v) standardized residuals. The US EPA (2012) provided excellent examples to illustrate some important aspects of computing benchmark doses (BMDs) and BMDLs from simple datasets and endpoints using EPA’s BMDS package.
Table 2.9 Parameter estimates with standard errors for 2nd‐degree multistage model. Table 2.10 Parameter estimates with standard errors for 1st‐degree multistage model. Table 2.11 Goodness‐of‐fit table.
Parameter
Maximum likelihood estimates (MLEs)
Standard error
Background
0.12
0.132665
Beta1
0.00930036
0.141898
Beta2
0.00925286
0.0246904
Parameter
Maximum likelihood estimates (MLEs)
Standard error
Background
0.111488
0.111488
Beta1
0.120556
0.120556
Dose
Estimated probability
Expected number responding
Observed number responding
Group size
Scaled residual
0.0000
0.1115
5.574
6
50
0.086
2.8300
0.2417
11.842
10
49
−0.205
5.6700
0.3531
17.657
19
50
0.118
For carcinogens, if “mode of action” information is insufficient, then linear extrapolation is typically used as the default approach for dose–response assessment. A straight line is drawn from the point of departure for the observed data (typically the BMDL) to the origin (where there are zero dose and zero response). The slope of this straight line is referred to as the slope factor (SF) or cancer SF that is used to estimate risk at exposure levels. When linear dose–response is used to assess cancer risk, excess lifetime cancer risk is calculated as follows:
Total cancer risk is calculated by adding the individual cancer risks for each pollutant in each pathway of concern (i.e. inhalation, ingestion, and dermal absorption) by using reasonable maximum exposure (RME) and then adding together the risk for all pathways.
Exposure assessment estimates the magnitude, frequency, and duration of human exposure to an agent in the environment or estimates future exposures for an agent that has not yet been released. Exposed concentration can be measured at the point of contact (the outer boundary of the body), estimated by separately evaluating the exposure concentration and the time through different scenarios, or reconstructed through internal indicators (biomarkers, body burden, and excretion levels) after the exposure. Table 2.12 lists the variables used in the US EPA Guidelines for Exposure Assessment (1992) for HRA.
Table 2.12 Specific parameters in health risk assessment.
Parameter | Definition | Default – child | Default – adult |
TRL | Target risk level (unitless) | 10−6 | 10−6 |
BW | Body weight (kg) | 15 | 70 |
AT | Averaging time (year) | 70 | 70 |
SFABS | Absorbed cancer slope factor (mg/kg/day)−1 | Chemical specific | Chemical specific |
ED | Exposure duration (year) | 6 | 30 |
EV | Event frequency (events/day) | 1 | 1 |
EF | Exposure frequency (days/year) | 350 | 350 |
FA | Fraction absorbed (unitless) | Chemical specific | Chemical specific |
tevent‐RME | Event duration (h) | 1 (bathing) | 0.58 (showering) |
SA | Surface area (cm2) | 6 600 | 18 000 |
Kp | Permeability coefficient (cm/h) | Chemical specific | Chemical specific |
ABSGI | Absorption fraction (unitless) | Chemical specific | Chemical specific |
τevent | Lag time per event (h) | Chemical specific | Chemical specific |
SFo | Oral cancer slope factor (mg/kg/day) | Chemical specific | Chemical specific |
t* | Time to reach steady state (h) | Chemical specific | Chemical specific |
DAD | Dermal absorbed dose (mg/kg/day) | Site specific | Site specific |
ADevent | Absorbed dose per event (mg/cm2/event) | Site specific | Site specific |
B | Dimensionless ratio of the permeability coefficient of a compound through the stratum corneum relative to its permeability coefficient across the viable epidermis (ve) (dimensionless) | Chemical specific | Chemical specific |
The internal dose via the dermal route, µg/kg bw/day, can be calculated as follows:
where
For a given cancer risk level at 10−6, the following equations can be used to estimate dermal absorbed dose (DAD) in mg/kg/day:
Example 2.3 illustrates the steps used to calculate the cleanup level from dermal exposure to compounds in water given an acceptable risk of 10−6. The default scenarios used in the calculations are (i) the adult 30‐year exposure and (ii) an age‐adjusted 30‐year exposure incorporating a child bathing for 1 h/event (RME value), once a day, 350 days/year for 6 years and an adult showering at 35 min/event (RME value), once a day, 350 days/year for 24 years. The general equations could be applied to any compound, and the example gives the calculation for one compound in water with a cancer risk of 10−6.
The following equations are provided by the EPA in the exposure assessment process. The scenario to be evaluated is residential soil. Equations (2.10), (2.11), and (2.12) are used for calculating the soil concentration, Csoil: Child or adult:
Age adjusted:
The age‐adjusted, body‐part‐weighted dermal factor is as presented in equation
For toxicity assessment, cancer SF can be derived based on absorbed dose: while RfD can be expressed as follows based on absorbed dose:
2.3.3.1 Cancer Screening Calculation for Dermal Contaminants in Water
2.3.3.2 Noncancer Screening Calculation for Contaminants in Residential Soil
Many benchmark concentrations could be correlated with ELUMO, while ELUMO significantly correlates with the number of chlorine for a given class DBP with a specific carbon number (Tang and Wang, 2010). Example 2.5 illustrates such correlation.
Risk characterization conveys the risk assessor’s judgment as to the nature and presence or absence of risks. How the risk was assessed should present where assumptions and uncertainties exist and where policy choices will need to be made. EPA recommends that the characterization fully and explicitly disclose the risk assessment methods, default assumptions, logic, rationale, extrapolations, uncertainties, and overall strength of each step in the assessment. After risk assessment, risk management and communication are key to protecting public health. Each component of the risk assessment (e.g. hazard assessment, dose–response assessment, exposure assessment) has an individual risk characterization for the key findings, assumptions, limitations, and uncertainties. Risk characterization also applies to both human HRA and ecological risk assessments.
The US EPA permits and uses quantitative structure and activity relationship (QSAR) principles to classify and prioritize DBPs because more than 600 DBPs have been identified and cataloged by the US EPA but only small fraction of them have been studied on toxicity quantification. QSAR can be used to predict the toxicity of a specific DBP or molecular property of DBP such as log P, which reflects hydrophobicity of a chemical compound. Experts use the principles of mechanism‐based structure and activity relationship (SAR) relative to known carcinogens, and mechanisms include structural analogy to known carcinogens, toxicokinetic and toxicodynamic factors, potency indicators for a structural analog, short‐term test data, and metabolic activation. Therefore, QSAR analysis has its unique role in setting drinking water standards.
QSAR analysis is based upon a simple fact that similar chemical structures are expected to exhibit similar chemical behavior. With tens of thousands of chemical structures to assess and many different empirical tests, QSAR is often used in strategic screening in product development through HI and risk assessment. Given sufficient knowledge on structurally or functionally related compounds, QSAR may be used for screening well‐defined biological, toxicological, or pharmacological endpoints of interest and associated kinetic characteristics such as absorption, distribution, metabolism, and excretion. In QSAR analysis, chemical structure is quantitatively correlated with a well‐defined process, such as biological activity, chemical reactivity, or toxicity of a chemical compound. The US EPA uses QSAR analysis to prioritize DBPs using three general types of predictive models:
There are hundreds of chemicals that have been identified as DBPs, a large proportion of which fall into the general category of halogenated DBPs. Due to evidence of carcinogenicity and developmental effects, halogenated DBPs are of particular interest. Two major molecular descriptors can be used in predicting the toxicity of DBP. One is log P and the other is the number of carbon and halogen atoms. For QSAR analysis, halogenated DBPs are classified into eight classes as illustrated in Figure 2.14.
According to the molecular structure of DBPs, Pierotti (1999) developed a DBP database containing more than one thousand DBP compounds for the eight classes of DBPs. The database includes more than 20 different physical and chemical characteristics for each DBP compound. Major variables are chemical name, address, subclass, CAS number, SMILES, disinfection process, molecular formula and structure, molecular weight, log P, cLog P, log P reference, melting point (°C), boiling point (°C), vapor pressure (mm of Hg), solubility (mol/l), EHOMO, ELUMO, surface area, MCLG (mg/l), MCL (mg/l), 10‐day exposure for a 10 kg child (mg/l), long‐term exposure for 10 kg child (mg/l), long‐term exposure for a 70 kg adult (mg/l), mg/l at 10−4 cancer risk, cancer group, SDWA reference, and toxicological effect. The database is used in developing QSAR models in the following section to illustrate statistical methods such as regression, outlier detection, quantification of uncertainty and sensitivity, and validation of QSAR models.
Multiple linear regression (MLR) is a common algorithm in deriving QSAR models. It relates the dependent variable y to a number of independent (predictor) variables, xj, by using a linear equation as follows:
where
To assess goodness of fit quantified by the correlation coefficient of multiple determination, (R2) is calculated by Equation (2.17). R2 estimates the proportion of the variation of y that is explained by regression equation (Massart et al., 1997). If there is a perfect fit, R2 = 1, while if there is no linear relationship between the dependent and independent variables, R2 = 0:
where
In predicting the toxicity of a chemical compound, two important molecular descriptors are hydrophobicity such as log P and electronic properties such as the energy of the lowest unoccupied molecular orbitals (ELUMO). Log P reflects the hydrophobicity of molecules, which often correlates well with the bioactivity of chemicals (Leo and Hansch, 1999). The logarithm of the partition coefficient (log P), also referred to as log Kow, describes the distribution of a compound between organic (usually n‐octanol) and water phases by the following equation:
where
If log P is greater than zero (0), the compound has a greater solubility in the organic phase; if log P is less than zero (0), the compound has a greater solubility in the aqueous phase. Chen et al. (2016) reported that log P can be predicted accurately by the ELUMO, the number of carbon (NC), and the number of chlorine (NCl). A general MLR model is to predict log P is
To develop a robust QSAR model, outliers have to be detected to improve the correlation coefficiency of the QSAR model. The Hotelling test and the associated leverage statistics can be used to detect outliers. The leverage of a chemical provides a measure of the distance of the chemical from the centroid of its active set. Chemicals in the active set have leverage values between 0 and 1. A warning leverage (h*), defined in Equation (2.19) (Eriksson et al., 2003), is a critical value to cut off the outliers from the dataset:
where
The William plot has a double‐ordinate Cartesian plot of cross‐validation residuals (first ordinate), standard residuals (second ordinate), and leverage (Hat diagonal: abscissa) values (h), which defined the domain of applicability of the model as a squared area within ±2 band for residuals and a leverage threshold, h*.
Figure 2.16 is a box plot that shows the contribution of each variable such as ELUMO, NCl, and NC to the standardized coefficients after the outliers were removed. It suggests that NC contributes to log P most positively, while ELUMO contributes to log P negatively to a lesser extent. The number of Cl and NCl contributes to log P positively but less than both NC and ELUMO.
Figure 2.17 shows that all the predicted log P values are within the boundary of 95% confidence of the observed log P values.
QSAR models should be balanced between the two extremes of overfitted versus underfitted through model validation. Model validation consists of internal model performance (goodness of fit and robustness) and external model performance (predictivity). Cross‐validation, bootstrapping, response randomization test, and training/test set splitting are recommended by the OECD (2007). In cross‐validation, a number of modified datasets are deleted. In each case, one or a small group of compounds from the data are processed in such a way that each object is removed one at a time. From the original dataset, a reduced dataset (training set or active set) is used to develop a partial model, while the remaining data (validation set) are used to evaluate the model predictivity (Efron, 1983; Osten, 1988).
The leave‐one‐out (LOO) method is the simplest cross‐validation procedure. Each compound is removed one at a time. For given n compounds, n‐reduced models are developed. Each of these models is developed with the remaining n − 1 compounds and used to predict the response of the deleted compound. The predictive power of the model is calculated as the sum of squared differences between the observed and estimated responses. This LOO method is accurate to quantify each compound’s impact on the developed QSAR model. However, the predictive power is often too optimistic, particularly with a large dataset compound because the perturbation of one compound is often insignificant.
The leave‐many‐out (LMO) cross‐validation method is to remove more than one compound each time. The dataset is divided into a number of blocks (referred to as cancelation groups). At each time, all the compounds belonging to a block are left out from the derivation of the model. Compared with the LOO method, the LMO method gives a more realistic estimation of predictive power because it introduces a larger perturbation in the dataset. There is no standard rule for splitting the data for a block, so it is normally defined by model users. The random clustering method in which one or more compounds are randomly selected as a subset of compounds in the cancelation block and left out for the deviation of the QSAR model can also be used.
All the models have intrinsic uncertainty that has to be quantified to communicate to the public about risks. Constraints, uncertainties, and assumptions having an impact on the risk assessment should be explicitly considered at each step in the risk assessment. Therefore, the quantitative description of uncertainty in HRA, carrying capacity, and climate change are critical for policy makers and the general public. For example, the IPCC Fifth Assessment (2015) adopted the following technical terms in dealing with uncertainty due to the uncertainties in quantifying all consequences of different emissions:
For the policy makers, the IPCC adopted the following terms for quantitative description purposes to express the assessed likelihood of an outcome or a result. Seven different descriptive terms are assigned to seven different likelihood probability ranges as in Table 2.19.
Table 2.19 Technical terms used for describing the probability of an outcome (IPCC, 2013).
Term | Likelihood probability of an outcome (%) |
Virtually certain | 99–100 |
Very likely | 90–100 |
Likely | 66–100 |
About as likely as not | 33–66 |
Unlikely | 0–33 |
Very unlikely | 0–10 |
Exceptionally unlikely | 0–1 |
To quantify different uncertainty, the US EPA identifies scenario, parameter, and model uncertainty. In each category of uncertainty, the sources of uncertainty are identified with a specific example in Table 2.20.
Table 2.20 Type of uncertainty with sources and examples (US EPA).
Type of uncertainty | Sources | Examples |
Scenario uncertainty | Descriptive errors | Incorrect or insufficient information |
Aggregation errors | Spatial or temporal approximations | |
Judgment errors | Selection of an incorrect model | |
Incomplete analysis | Overlooking an important pathway | |
Parameter uncertainty | Measurement errors | Imprecise or biased measurements |
Sampling errors | Small or unrepresentative samples | |
Variability | In time, space, or activities | |
Surrogate data | Structurally related chemicals | |
Model uncertainty | Relationship errors | Incorrect inference on the basis for correlations |
Modeling errors | Excluding relevant variables |
Furthermore, the US EPA lists different quantification methods of uncertainty with examples in Table 2.21.
Table 2.21 Approaches to quantitative analysis of uncertainty.
Approach | Description | Examples |
Sensitivity analysis | Changing one input variable at a time while leaving others constant, to examine effect on output | Fix each input at lower (then upper) boundary while holding others at nominal values (e.g. medians) |
Analytical uncertainty propagation | Examining how uncertainty in individual parameters affects the overall uncertainty of the exposure assessment | Analytically or numerically obtain a partial derivative of the exposure equation with respect to each input parameter |
Probabilistic uncertainty analysis | Varying each of the input variables over various values of their respective probability distributions | Assign probability density function to each parameter; randomly sample values from each distribution and insert them in the exposure equation (Monte Carlo) |
Classical statistical methods | Estimating the population exposure distribution directly, based on measured values from a representative sample | Compute confidence interval estimates for various percentiles of the exposure distribution |
For a QSAR model in which r is function of j variables X1, X2, …, Xj:
The uncertainty due to errors in variable can be expressed as follows (Coleman and Steele, 1989):
When the above uncertainty equation is applied to log P as function r, while ELUMO, NCl, and NC are the three variables, the uncertainty equation of log P can be expressed in Equations 2.25 and 2.26:
When the data reduction expression is very complex and the task of computing the partial derivatives in the above equations is extremely laborious, Monte Carlo simulation (MCS) that is the most efficient quantification of uncertainty according to distribution of independent variables can be used (Cox et al., 2001). MCS is used as an example to quantify the uncertainty of the predicted log P using ELUMO, NCl, and NC as three independent variables.
Monte Carlo Simulation (MCS) has been extensively used for quantifying uncertainty of linear equations (Tang et al., 2009). Estimating propagation of error distributions by MCS is based on theoretical principles and supports a fully consistent and transferable estimation of measurement uncertainty. In general case, a linear model equation is to measure output Z indirectly obtained from the input variables X1, X2, …, Xn by a functional relationship F:
The knowledge about the values that may be reasonably attributed to quantities Xi, considered as continuous random variables, is expressed by their probability distribution function (PDF), , within the corresponding domain. An expectation or best estimate for the value of Xi E(Xi) and the uncertainty associated with this value, μ(Xi), assimilated to the standard deviation σ(Xi), are obtained from the PDF:
If the equation is linearized by means of a Taylor expansion about the point, the estimated uncertainty of the measured output μ(Z) is from the input μ = (μ1, μ2, …, μn) while the second‐ and higher‐order terms are neglected, the estimated uncertainty of μ(Z) from the input variables is calculated as follows:
Taking μ(Z) as the values F(μ) = F(μ1, μ2, …, μn) leads to the well‐known law of propagation of uncertainty:
In the Guide to the Expression of Uncertainty in Measurement (GUM, 1993), which is the internationally accepted master document for the evaluation of uncertainty, the combined standard uncertainty μ(Z) is evaluated from the standard uncertainty of the input variables, μ(Xi), and the covariances between correlated ones, cov(Xi, Xj), if all the input variables are independent, (Xi, Xj) = 0. The essence of the MCS is to simulate sampling to a target population with a given expectation μz and variance σ2(Z). A random sample of size M is obtained from the simulation of M independent and identically distributed random variables Z1, Z2, …, ZM. According to the central limit theorem (Martinez, 2002), the distribution of the sum is approximately normal, with expectation Mµz and variance Mσ2:
According to the rule of “two sigmas,” the probability
By dividing Equation (2.33) by M on both sides, the expression becomes
By substituting into Equation (2.34), the expression becomes
This relationship is the foundation of the MCS because it establishes the rule to evaluate the error of set as . The sample variance may be best estimated using σ2:
After defining the coverage probability, P, the confidence interval for the result is evaluated as , where extremes correspond to the 2.5 and 97.5% percentiles of the sorted Z values. When the skewness value of the Z forecast discrete distribution is near zero, the confidence interval becomes symmetric and expanded uncertainty U(Z) is estimated by Equation (2.38):
MCS is a powerful tool in quantifying uncertainty in addition to experimental, analytical, or other numerical methods. MCS protocol can replace point estimates with random variables drawn from probability density functions. To perform MCS, a computer program is used to generate the pseudorandom numbers to simulate the values of the variables within a given PDF. Several commercial software programs such as Crystal Ball, LabVIEW, @RISK, and Analytica are the most popular for this purpose. There are three major steps if Crystal Ball by Oracle (2016) is used to carry out the MCS. First, a probability distribution is incorporated into a spreadsheet cell, and each time the spreadsheet is recalculated, a new value of the random variable is selected from the distribution and used for calculations. Second, the entire simulation is run at least 10 000 times to satisfy the required high number of trials. Each time new values of the random variables are selected, a new estimate of the final target is generated. Third, the results of simulations are summarized in a user‐friendly interface such as a table and a figure.
A step‐by‐step procedure for uncertainty assessment of regression QSAR model for halogenated DBPs is illustrated in Figure 2.21. The first step is the compilation of the DBP data. Three variables such as ELUMO, the number of chorine (NCl), and the number of carbon (NC) were used. The characterization of the probability distributions is carried out by statistic software SAS 9.4 to obtain the data average and standard deviation. All these parameters are fed into the MCS that gives the results in a probability distribution around a mean value that is used to carry out a detailed sensitivity analysis. A sensitivity analysis is applied to identify which parameters had the most impact on the predicted log P. If small modifications of one parameter characterized by a probability distribution strongly influenced the final result, it may be concluded that the sensitivity of the variable is very high. Sensitivity is crucial in determining what variables are the most important in predicting a dependent outcome. This can be analyzed by displaying the sensitivity as a percentage of the contribution from each parameter to the variance of the final result. Crystal Ball presents contribution in terms of percentage for each independent variable with the sum of percentage contribution to 100%. A general procedure in the uncertainty quantification of the predicted log P of a class DBP through MCS is outlined as follows:
Example 2.9 presents the MCS results produced by Crystal Ball software. The methodology can be used for the quantification of uncertainties of any regression curve in general.
The uncertainty of log P could also be estimated by point estimate method (PEM). In most cases, uncertainty quantified by MCS should be more accurate than that calculated by PEM because MCS counts the distributions and propagation of the uncertainty. The uncertainty results obtained from the PEM method and the MCS for QSAR models are compared in Example 2.10.
The influence of different variables on the outcome of model’s prediction can be visually presented through sensitivity analysis in Crystal Ball. Table 2.25 summarizes the sensitivity results for each DBP class. Figure 2.26 demonstrates that NCl is the most influential molecular descriptor in predicting log P of all DBP classes, except halogenated alkane.
Table 2.25 Sensitivity of various descriptors to log P by DPB classes.
No. | Compound class | ELUMO (%) | NCl (%) | NC (%) |
1 | Halogenated alkane | −16.0 | 0.8 | 83.2 |
2 | Halogenated alkene | −44.0 | 53.9 | 2.1 |
3 | Halogenated aromatic | 0.6 | 90.6 | 8.7 |
4 | Halogenated aldehyde | −7.3 | 64.9 | 27.8 |
5 | Halogenated ketone | 0.0 | 93.0 | 7.0 |
6 | Halogenated carboxylic acid | 36.5 | 55.6 | 7.9 |
Table 2.25 shows the major difference between sensitivities of single bond compounds, such as halogenated alkanes. Log P of chlorinated alkane changes significantly with the length of the carbon chain, NC. However, for all the other classes of DBPs containing unsaturated π bond, log P will change mostly with the number of chlorine, NCl. Log P reflects the amount of hydrogen bonding between the chemical compound and the hydrogen in the water molecule in the absence of oxygen in the chlorinated alkanes. It appears that when chlorine attaches to carbon, it only has influence on the electron cloud of the attached carbon and such electronic interaction is tightly bonded; therefore, the number of chlorine on the chlorinated alkane carbon chain does not significantly influence the strength of hydrogen bonds. However, with DBP compounds containing unsaturated π bonds, the chlorine atom would attract electron cloud to itself since most compounds containing unsaturated bonds may have a resonance structure through which the influence of chlorine will be transmitted. As a result, the strength of the hydrogen bond between the chemical compound and water molecule would be significantly influenced by the number of chlorine, NCl. Therefore, log P is more sensitive to the number of chlorine.
There are many computer software available to conduct risk assessment. Table 2.26 lists other computer software for quantitative risk assessment.
Table 2.26 Software for quantitative risk assessment.
Software | Type of analysis | Creator |
@RISK | Uncertainty and risk analysis | Palisade Corporation |
Analytica | Uncertainty and risk analysis | Lumina Decision Systems, Inc. |
GENII/SUNS | Uncertainty and sensitivity analysis | Sandia National Laboratories |
Pacific Northwest Laboratory | ||
MOUSE | Uncertainty analysis | EPA, Risk Reduction Engineering Laboratory |
ORMONTE | Uncertainty and sensitivity analysis | Oak Ridge National Laboratory |
Risk Calc | Uncertainty and risk analysis | Applied Biomathematics |
SimLab | Uncertainty and sensitivity analysis | Simlab |
TAM3 | Uncertainty and sensitivity analysis | Oak Ridge National Laboratory |
Uncertainty Analysis | Uncertainty analysis | Integrated Sciences Group |
Crystal Ball® | Uncertainty and sensitivity analysis | Oracle |
Under the heading of “Mortality and global health estimates,” you will be able to download the following database:
Life expectancy for different countries. Use SPSS to conduct the following:
Collect water quality and air quality data of Xiongan and conduct health risk assessment on the major primary pollutants of the following. Toxicity data could be found in the US EPA Exposure Factors Handbook 2011 Edition:
Answer the following questions: