Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

2
Health Risk Assessment

If the Earth were sick, nobody would be healthy.

2.1 Environmental Health

Human footprints on the nature significantly deteriorate the environmental quality of air, water, and soil, pose a great threat to human health and reduce human life expectancy. Taking air pollution as an example, the primary and secondary ambient air standards of particulate matter of diameter 2.5 µm (PM_2.5) in the United States are 12 and 15 µg/m³, respectively (US EPA, 2016). The corresponding standards of PM_2.5 in China are set at 15 and 30 µg/m³, respectively (China MEP, 2016). However, the average concentration of PM_2.5 for 10 provinces in China was greater than 55 (µg/m³) in 2015. According to the data from the Administration of Energy of China, 64% of primary energy in China is from coal. China with 1.4 billion population burns about 47% of the world’s total coal consumption. The annual average PM_2.5 for the top 10 most polluted provinces in China is at least twice as much as the secondary standard of 35 µg/m³ as shown in Figure 2.1.

China’s 10 most polluted provinces in 2015. The graph has 10 descending columns with 3 horizontal lines. Columns are along Henan, Beijing, Hebei, Tianjin, Shandong, Hubei, Jiangsu, Shanxi, Anhui, and Chongqing. — **Figure 2.1** China’s 10 most polluted provinces in 2015.

(*Source:* From the US EPA and China MEP (2016).)

In China, mortality rates due to environmental damage are significantly greater than natural mortality rates. Lung, liver, and stomach cancers are the top three mortality causes in China. Epidemic studies indicate that lung cancer is associated with chemical smog due to coal burning, liver cancer with drinking water and strong liquor, and stomach cancers with contaminated food due to soil pollution. About 90% of the 161 cities whose air quality was monitored in 2014 were below Chinese official standards according to the National Bureau of Statistics of China (NBSC). More than half of China's surface water is so polluted that it cannot be used as drinking water resources. As a result, about 4000 people died every day in China due to coal burning as the likely principal cause. Due to its severity, air pollution is the leading cause of lung cancer and other respiratory infections. Figure 2.2 shows the percentage of leading causes of death in the world. In China, the most common causes of death are (i) lung cancer with 1.59 million deaths, (ii) liver cancer with 745 000 deaths, (iii) stomach cancer with 723 000 deaths, (iv) colorectal cancer with 694 000 deaths, (v) breast cancer with 521 000 deaths, and (vi) esophageal cancer with 400 000 deaths.

Graph for the 7 leading causes of death in the world in 2012. The graph has 7 columns, along ischaemic heart disease, stroke, COPD, lower respiratory infections, etc. Ischaemic heart disease has the highest peak. — **Figure 2.2** Seven leading causes of death in the world in 2012.

(*Source:* Data from WHO (2014).)

Worldwide, environmental pollution is also the leading cause of cancer and contributes to major premature mortality. WHO (2014) estimated that there were 3.1, 1.6, and 1.5 million people who died from lower respiratory infections, trachea bronchus lung cancers, and diarrhea diseases in the world as shown in Figures 2.2 and 2.3. There were approximately 14 million new cases, and the number of new cases is expected to rise by about 70% over the next two decades (WHO, 2014). In 2012, ischemic heart disease, stroke, and chronic obstructive pulmonary disease caused 7.4, 6.7, and 3.1 million deaths, respectively (NBSC, 2012).

Pie graph of 7 leading causes of death: 29.7% ischaemic heart disease; 26.9% stroke; 12.5% COPD and lower respiratory infections; 6.4% tracheal bronchus, lung cancers; and 6% HIV/AIDS and diarrheal diseases. — **Figure 2.3** Seven leading causes of death (percent) in the world.

(*Source:* Data from WHO (2014).)

Figure 2.4 shows that the leading causes of mortality due to malignant tumor is 168 and 159 deaths per 100 000 population in urban and rural China, respectively, which is about 28 and 25% of the leading causes of death in urban and rural China.

Clustered bars for mortality rate of malignant tumor; heart, cerebrovascular, and respiratory system diseases; external causes of injury and poison; and other causes of death in urban and rural China in 2009. — **Figure 2.4** Leading causes of death in urban and rural China in 2009.

(*Source:* Data from National Bureau of Statistics of China (2012).)

Further evidence of the environmental health issue can be seen from Figures 2.5 and 2.6. Zhao et al. (2010) showed that lung, liver, stomach, esophageal, and colorectal cancer were 32, 19, 18, 9, and 8% in urban China, while these data were 24, 23, 23, 15, and 5% in rural China, respectively, as shown in Figure 2.5.

2 Pie graphs of leading causes of death (percent), such as malignant tumor and heart diseases in urban (left) and rural (right) China in 2009. Both graphs indicate highest rate for malignant tumor. — **Figure 2.5** Leading causes of death (percent) in urban and rural China in 2009.

(*Source:* Data from National Bureau of Statistics of China (2012).)

Pie graphs of top 10 cancers, such as lung cancer, liver cancer, stomach cancer, and esophageal cancer in urban (left) and rural (right) China in 2004–2005. Both graphs indicate highest rate for lung cancer. — **Figure 2.6** Top 10 cancers in urban and rural China in 2004–2005.

(*Source:* Zhao et al. (2010). Reproduced with permission of Oxford University Press.)

To monitor the progress of protecting human health, the United Nation uses an improved sanitation facility as a metric. It is defined as piped sewerage, septic tanks, and pit latrines with slabs or composting toilets not shared with other households. From 1990 to 2015, the coverage of improved sanitation at these facilities rose from 54% to around 68% globally as shown in Figure 2.7. This number, however, missed the Millennium Development Goal (MDG) target by 9%. In 2015, about 946 million people were still practicing open defecation worldwide (WHO, 2016).

Vertical bars indicating approximately 4.9, 0.9, 0.7, and 0.6 billion sanitation facilities with improved sanitation, open defecation, unimproved sanitation, and shared sanitation, respectively, in 2015. — **Figure 2.7** Global sanitation facilities in 2015.

(*Source:* Data from WHO (2016).)

To achieve the Sustainable Development Goals (SDG) target, cities all over the world need to reduce open defecation, promote handwashing, and improve management and treatment of faucal wastes from both collected sewer and on‐site facilities. For drinking water, the UN uses pipe water on premise and public standpipes, boreholes, protected wells, springs, and rainwater as indicators for improved drinking water. In 2015, there were 6.6 billion people using improved drinking water, while 0.7 billion still used an unimproved drinking water source or surface water (WHO, 2016), as shown in Figure 2.8.

Vertical bars indicating population of (highest–lowest) piper water on premises; public standpipes, boreholes, protected wells and springs, and rainwater; unimproved water sources; and surface water. — **Figure 2.8** Global drinking water sources.

(*Source:* Data from WHO (2016).)

About one‐quarter of improved sources are contaminated with feces and approximately 1.8 billion people drink water containing such contamination (WHO, 2016). Therefore, only 68% in urban areas and 20% in rural areas have truly safe drinking water. Unsafe drinking water causes liver cancer due to disinfection by‐products (DBPs) and other contaminants such as antibiotics in drinking water.

2.2 Environmental Standards

The US EPA establishes environmental standards using health risk assessment (HRA) according to carcinogenic or noncarcinogenic chemicals. For chemicals that are known or expected to cause adverse health effects, the EPA established an enforceable maximum contaminant level (MCL) or nonenforceable maximum contaminant level goal (MCLG). The Safe Drinking Water Act (SDWA) proclaims standards and health advisories (HAs) for DBPs. HRA quantifies factors such as adsorption during ingestion, pharmacokinetics, mutagenicity, reproductive and developmental effects, and carcinogenicity (Pontius, 1990). The MCLs are federally enforceable limits for contaminants in drinking water established as the national primary drinking water regulations (NPDWRs). The secondary MCLs are established under the SDWA to protect public welfare such as odor, taste, and appearance. The HAs for drinking water contaminants are levels considered to be without appreciable health risk for specific durations of exposure and are not legally enforceable. Similarly, the MCLGs are nonenforceable health goals, which are to be set at levels at which no known or anticipated adverse effects on the health of persons occur with an adequate margin of safety. Table 2.1 is the description of the SDWA standards and HA categories.

Table 2.1 SDWA standards and health advisories.

Standards
MCL (maximum contaminant level)	The enforceable concentration that is provided to public water system users
MCLG (maximum contaminant level goal)	The nonenforceable concentration that protects humans from adverse effects
Health advisories
RfD (reference dose)	An estimation of daily human exposure without appreciable risk to adverse effects over a lifetime
DWEL (drinking water equivalent level)	The estimation of a lifetime exposure that protects humans from adverse noncancerous health effects, assuming the sole exposure source is from drinking water
One‐day exposure	The drinking water concentration that is not expected to cause adverse noncarcinogenic effects if exposure continues for five consecutive days
10‐Day exposure	The drinking water concentration that is not expected to cause adverse noncarcinogenic effects if exposure continues for 14 consecutive days
Long‐term exposure	The drinking water concentration that is not expected to cause noncarcinogenic effects if exposure continues for 10% of the person’s lifetime
Lifetime HA	The drinking water concentration that is not expected to cause adverse noncarcinogenic effects if exposure continues for a lifetime

To assess an individual’s risk, bioassays are converted to estimate human risk based on human exposure. The dose–response curves from animal tests were used to determine the equivalent human dose–response curve. The MCLG is set using a three‐category approach dependent upon the evidence of carcinogenicity. If the evidence of carcinogenicity is strong, the MCLG is set to zero. If the evidence of carcinogenicity is limited, one of two methods is utilized to calculate the MCLG depending on the toxicity data available. For every chemical compound that has an MCLG, an MCL or treatment technique must be determined. Considering the cost of remediation, the MCL is based on the best available technology (BAT) and set close to the MCLG. If BAT cannot achieve zero, MCL cannot equal to zero. When the MCL is within a risk of 10⁻⁴ to 10⁻⁶, the MCLG is set to zero.

In the past, reference dose (RfD) expressed in units of milligrams per kilogram of body weight per day (mg/kg/day) was used. The RfD is based on lifetime exposure level at which there is no significant risk to humans. It is calculated by dividing the no observed adverse effect level (NOAEL) or the lowest observed adverse effect level (LOAEL) by an uncertainty factor. The uncertainty factor accounts for differences between human and animal and differences within the human population and varies from 10 to 1000 depending upon the toxicity data available. RfD is determined using the following equation:

(2.1)

Since an extremely high uncertainty factor is used, major limitations of RfDs are the following: (i) it is limited to one of the doses in the study and is dependent on study design, (ii) it does not account for variability in the estimate of the dose–response, (iii) it does not account for the slope of the dose–response curve, and (iv) it cannot be applied when there is no NOAEL, except through the application of an uncertainty factor (Crump, 1984; Kimmel and Gaylor, 1988).

Example 2.1

According to the US EPA Exposure Factors Handbook 2011 Edition (US EPA 2011), a chronic oral toxicity study of chloroform on dogs was used to derive the RfD of 0.01 mg/kg/day. This RfD is based on an LOAEL for hepatotoxicity and application of an uncertainty factor of 1000.

Find: The MCLG of chloroform.

Solution:

For a 70 kg adult consuming 2 l of tap water per day, applying a relative source contribution of 80% chloroform from drinking water.

Answer: MCLG of chloroform in drinking water is 0.3 mg/l based on hepatotoxicity (U.S. EPA, 1994).

Comment: The MCLG of 0.3 mg/l based on hepatotoxicity (U.S. EPA, 1994) is lower than the values determined from the LED₁₀/ED₁₀ approach for kidney tumorigenesis (0.6 or 1 mg/l) and is consistent with chloroform’s putative mode of action involving the oxidative generation of reactive and toxic metabolites (phosgene and hydrochloric acid).

To overcome the high uncertainty factor of the RfD, the EPA developed benchmark dose (BMD) methods by fitting mathematical models to dose–response data and to select a BMD associated with a predetermined benchmark response (BMR), such as a 10% increase in the incidence of a particular lesion or a 10% decrease in body weight gain. Results from all models include a reiteration of the model formula and model run options chosen by the user, goodness‐of‐fit information, the BMD, and the estimate of the lower‐bound confidence limit on the BMD (BMDL). The benchmark dose software (BMDS 2.6) by the US EPA (2016) has the nested models, parameter standard error reporting, and parameter initialization for continuous models. BMDS 2.6 contains thirty different models appropriate for the analysis of dichotomous (quantal) data, continuous data, nested developmental toxicology data, multiple tumor analysis, and concentration–time data. Typical models used in the software are shown in Table 2.2.

Table 2.2 Models used in the US EPA Benchmark Dose Software 2.6 (US EPA, 2015).

Model type	Model	Abbreviation
Continuous	Exponential	exp
	Hill	hil
	Linear	lin
	Polynomial	ply
	Power	pow
Dichotomous	Gamma	gam
	Logistic	log
	LogLogistic	lnl
	LogProbit	lnp
	Multistage	mst
	Multistage cancer	msc
	Probit	pro
	Weibull	wei
	Quantal linear	qln
	Dichotomous hill	dhl
Dichotomous alternative	Gamma‐BgDose	gmb
	Logistic‐BgResponse
	LogProbit‐BgDose	lpb
	Mutistage‐BgDose	msb
	Multistage‐Cancer‐BgDose	mcb
	Probit‐BgResponse	prb
	Weibull‐BgDose	web
Nested	Nested logistic	nln
	NCTR	nct
	Rai and van Ryzin	rvr
Repeated response measures	ToxicoDiffusion	txd
Concentration × time	ten Berge	ten
Multitumor	MS_Combo	multi

The SDWA proclaims standards and HAs for DBPs (US EPA, 1997). However, setting the MCLs and MCLGs involves a lot of uncertainty, because discrepancies may exist for lifetime and longer‐term exposure HAs due to conservative policies. For example, the uncertainty factor may vary from 5 to 5000 when the lifetime health advisory concentration is estimated (U.S. EPA, 1996). Human HRA and environments risk assessment (ERA) are used to develop both ambient and discharge standards by the US EPA. Risk assessment is a systematic approach to characterize the nature and magnitude of the risks associated with environmental or health hazards.

Since chlorination is the major disinfection process in the United States, regulation of DBP concentration in drinking water is one of the major challenges faced by the US EPA. Major human and financial resources have been devoted to identify, monitor, assess, and regulate the human health effect of DBPs. As a result, HRA guidelines of DBPs were developed to protect the public from both biological and chemical risks. For example, the US EPA developed the Stage 2 DBPR aiming to reduce peak DBP concentrations in the distribution system. When a water treatment plant (WTP) assesses its disinfection strategy, both the disinfectant effectiveness against the target pathogen and the DBPs formed as a result of the disinfectant must be considered in the decision‐making process. Since no toxicity test could be performed on humans, animal test data are used with quantified uncertainty and variation involved. During HRA of DBP, information for accurate evaluation of DBP risk may not be complete, and the uncertainty factor in the assessment may be quite large (Cothern et al., 1986). For example, for total trihalomethanes (TTHM) and five haloacetic acids (HAA5), the drinking water standards were set at 80 and 60 µg/l as locational running annual average (LRAA) by the EPA (2006), respectively. If there was no specific toxicity information available, the MCLG was set up based upon a quantitative structure–activity relationship (QSAR) study. For example, the MCLG of 1,1‐dichloroethylene and cis‐1,2‐dichloroethylene were developed using QSAR approach.

In addition to pathogenic bacteria and viruses, Cryptosporidium is one of the new public health concerns for the EPA. The EPA established the Long Term 2 Enhanced Surface Water Treatment Rule (LT2ESWTR) based on Cryptosporidium concentrations in source water and current treatment practices. Four bins were recommended corresponding to additional treatment requirements for filtered WTPs as shown in Table 2.3. WTPs with average Cryptosporidium concentrations less than 0.075 oocysts per liter (oocysts/l) are placed in bin 1 where no additional treatment is required. For concentrations of 0.075 oocysts/l or more, treatment beyond the existing processes is required. The additional treatment required for each bin, specified in terms of log removal, depends on the type of treatment that the WTP already uses. In setting the biological standards, the term “log” means the order of magnitude reduction in concentration; e.g. 2‐log removal equals a 99% reduction, 3‐log removal equals a 99.9% reduction, and 4‐log removal equals a 99.99% reduction. Giardia and virus need 3‐log and 4‐log removal and/or inactivation, respectively.

Table 2.3 Bin requirements for filtered PWSs (US EPA, 2006).

Cryptosporidium concentration (oocysts/l)	Bin classification	*And if the following filtration treatment is operating in full compliance with existing regulations, then the additional* treatment requirements are:**
Cryptosporidium concentration (oocysts/l)	Bin classification	Conventional filtration treatment (includes softening)	Direct filtration	Slow sand or diatomaceous earth filtration	Alternative filtration technologies
<0.075	1	No additional treatment	No additional treatment	No additional treatment	No additional treatment
≥0.075 and <1.0	2	1‐log treatment	1.5‐log treatment	1‐log treatment	As determined by the state
≥1.0 and <3.0	3	2‐log treatment	2.5‐log treatment	2‐log treatment	As determined by the state
≥3.0	4	2.5‐log treatment	3‐log treatment	2.5‐log treatment	As determined by the state

To reduce biological and chemical risks simultaneously, UV disinfection appeared to be the best option for drinking water and treated wastewater effluent. Unlike chemical disinfectants, UV leaves no residual that can be monitored to determine UV dose and inactivation credit. To earn disinfection credits, however, a relationship between the required UV dose and these parameters must be established and then monitored at a WTP to ensure sufficient disinfection of microbial pathogens. The UV dose depends on the UV intensity (measured by UV sensors), the flow rate, and the UV transmittance (UVT). The US EPA (2006) recommended UV dose requirements (mJ/cm²) in Table 2.4.

Table 2.4 UV dose requirements: millijoules per centimeter squared (mJ/cm²) (the US EPA, 2006).

Target pathogens	Log inactivation
Target pathogens	0.5	1.0	1.5	2.0	2.5	3.0	3.5	4.0
Cryptosporidium	1.6	2.5	3.9	5.8	8.5	12	15	22
Giardia	1.5	2.1	3.0	5.2	7.7	11	15	22
Virus	39	58	79	100	121	143	163	186

2.3 Health Risk Assessment

Human health risk assessment (HRA) quantifies factors such as adsorption during ingestion, pharmacokinetics, mutagenicity, reproductive and developmental effects, and carcinogenicity (Pontius, 1990). It is to quantify the likelihood that adverse human health effects may occur or are occurring as a result of exposure to one or more stressors (US EPA, 1992). To reduce environmental health mortality, HRA is used to detect issues, identify hazards, characterize hazards, assess exposure, and manage and communicate risk. By using HRA, the EPA developed an Integrated Risk Information System (IRIS) that contains the human health effect databases of chemicals that may result from exposure to various chemicals in the environment. The IRIS 10⁻⁶ risk level is the contaminant concentration (in µg/l) in drinking water that would yield no greater than an additional risk of one in a million (10⁻⁶) after a lifetime of drinking that water. The acute 10‐day values apply specifically to acute toxic effects on children but are expected to be protective for adults. For noncarcinogenic chemicals, this value is typically the same as the MCLG. The chronic (lifetime) values for cancer are set at a level that should yield no greater than an additional 10⁻⁶ risk over a lifetime exposure. According to the IRIS, the EPA cancer risk is classified in six different categories: (i) H is carcinogenic to humans, (ii) L is likely to be carcinogenic to humans, (iii) L/N is likely to be carcinogenic above a specified dose, (iv) S is suggestive evidence of carcinogenic potential, (v) I is inadequate information to assess carcinogenic potential, and (vi) N is not likely to be carcinogenic to humans.

2.3.1 Hazard Identification

Hazard identification (HI) is to determine whether exposure to a stressor can cause an increase in the incidence of adverse health effect in humans, which reflects the capacity of an agent to cause adverse health effects in humans and other animals (US EPA, 1995). Qualitative description in HI is complemented with QSAR, genetic toxicity, pharmacokinetic, and the weight of evidence. For example, toxicokinetic data that deal with how the body absorbs, distributes, metabolizes, and eliminates specific chemicals are usually used in HI. In the HI of chlorination, chemical disinfectants such as chlorine and chloramines react with naturally occurring organic material (NOM) to produce DBPs that are potential carcinogens and development defects in laboratory animals. DBPs have been linked to potential health risks such as liver, kidney, or central nervous system problems and the increased risk of cancer. To quantify the toxicity of DBPs, QSAR analysis is one important method of estimating the carcinogenicity of DBP using animal toxicity tests. There are six major classes of DBPs, namely, halogenated alkane, halogenated alkene, halogenated aromatic, halogenated aldehyde, halogenated ketone, and halogenated carboxylic acid. A class of DBPs usually has the similar toxic mode or mechanisms. Table 2.5 lists these DBP classes and their definition.

Table 2.5 Classification of DBPs.

DBP class	Definition
Trihalomethanes	Chemical compounds in which three of the four hydrogen atoms of methane (CH₄) are replaced by halogen atoms
Haloacetic acids	Carboxylic acids in which a halogen atom takes the place of a hydrogen atom in acetic acid
Haloacetonitriles	Small organic compounds containing nitrogen, chlorine, and/or bromine. Little data is available on the toxicity of the haloacetonitriles; however, animal studies suggest that dichloroacetonitrile is mutagenic and therefore potentially carcinogenic
Haloketone	A functional group consisting of a ketone group with an α‐halogen substituent. The general structure is RR′C(X)C(=O)R where R is an alkyl or aryl residue and X any one of the halogens
Other halogenateds compounds	Chloropicrin, chloral hydrate, and cyanogen chloride
Aldehydes	Organic compounds that have an acyl group, R–C=O with a hydrogen bond to the carbonyl or acyl carbon (double‐bonded carbon)
Inorganic compound	Chlorine, chloramines, chlorine dioxide, and bromate

In addition to the main molecular structures such as alkane, alkene, aromatic compounds, substituents such as chlorine and bromine within each DBP class contribute significantly to the toxicity of the corresponding DBPs. Table 2.6 compares the relative order of potency of haloacetonitriles in terms of different toxic mechanisms (Pontius, 1990). For example, haloalkanes are procarcinogens that are activated via reactions in which CYP450 acts as a catalyst in dehalogenation. Brominated compounds are expected to be more alkylating activity reactive than chlorinated compounds. Chlorine or bromine substitution at the α‐carbon or terminal carbon in this group is expected to be potential alkylating agents. The active halogen at both ends of the aliphatic chain is also expected to be cross‐linking agents. The stability of several chlorinated ketones in aqueous solutions follows this order: 1,3‐dichloro > pentachloro ≫ hexachloro. Among the mutagenic chloropropanols (mostly direct acting), the relative mutagenic potency follows the order 1,3‐ > 1,1,3,3‐ > penta‐ > 1,1,3‐ > 1,1,1‐ > 1,1‐, with the potency of 1,3‐ being about 100–1000 times higher than that of 1,2‐dichloroacetic acid (DCA) and trichloroacetic acid (TCA), which are known as mouse liver carcinogens and are by‐products found in drinking water. In animal tests, DCA produces developmental, reproductive, neural, and hepatic effects. Aldehydes as an electrophilic compound may form DNA–protein cross‐links and lead carcinogenesis.

Table 2.6 Comparison of chemical, biochemical, and biologic properties of haloacetonitriles.

Chemical/biochemical/biologic tests in assessing carcinogenic potential	Relative order of potency of haloacetonitriles tested
Alkylating activity 4‐(p‐nitrobenzyl)(pyridine reaction)	Br₂ ≫ BrCl > Cl ≫ Cl₂ ≫ Cl₃
Inhibition of glutathione S‐transferase	Cl₃ > Br₂ > Cl₂ > Br > Cl
Escherichia coli SOS chromotest	BrCl > Br₂ > Cl₂ ≫ Cl (inactive)
Ames or Ames fluctuation tests	Cl₂ ≈ BrCl > Cl₃ ≈ Br₂ > Cl ≈ Br (inactive)
Ames or Ames fluctuation tests	Cl₃ > Cl > Cl₂ > Br₂ ≈ Br (inactive)
DNA single‐strand breaks in HeLa cell (comet assay)	Cl₃ > BrCl > Br₂ > Cl₂ > Cl
DNA single‐strand breaks in HeLa cell (comet assay)	Br₂ > Cl₃ ≈ Cl₂ ≈ BrCl > Cl
Sister chromatid exchanges in Chinese hamster ovary cells	Br₂ > BrCl > Cl₃ ≥ Cl₂ > Cl
Newt micronucleus assay	Br₂ ≥ Cl₃ > Br > Cl₂ ≫ Cl
In vivo mouse micronucleus assay	Br₂ ≈ BrCl ≈ Cl₃ ≈ Cl₂ ≈ Cl (all inactive)
Lung adenoma assay in strain A mice	Cl ≈ Cl₃ ≈ BrCl > Br₂ ≈ Cl₂ (inactive)
Skin tumor initiation in SENCAR mice	Br₂ ≈ Cl ≈ BrCl > Cl₃ (inconsistent) > Cl₂ (inactive)

2.3.2 Dose–Response Curves

A dose–response relationship describes how severity of adverse health effects (the responses) is related to the exposure amount to a chemical compound. The measured response usually increases as the dose increases. To establish drinking water standards, dose–response curves of animal toxicity tests are used to establish toxicity slopes or BMD and to document the dose–response relationship over the range of observed doses. With animal toxicity data, the health risk is extrapolated beyond the lower range of available observed data until the dose level begins to be adverse to human health. The shape of the dose–response relationship curve depends on the chemical, the kind of response, incidence of disease, and death. However, there are major gaps during the extrapolation from animal to human and from high dose to low dose to predict human health risk. Great uncertainty could be introduced during these extrapolations through either nonlinear or linear dose–response models for different modes of action. For example, animal tumor data is based on the development of the dose–response parameter as effect dose (ED) at 10% (ED₁₀) or lethal dose at 10% (LD₁₀). The actual dose heavily depends upon the exposure routes and animals. The EPA lists the general LD₅₀ of different chemical compounds with great variations in Table 2.7. For dioxin (TCDD), the LD₅₀ is as low as 0.001 mg/kg. For inhalation toxicity test, the LD₅₀ for rats and mice are 293 and 137 mg/kg, respectively. Table 2.7 shows LD₅₀ for different chemicals varied thousands of times.

Table 2.7 LD₅₀ of typical chemicals based upon animal toxicity test (US EPA, 2006).

Chemical	LD₅₀ (mg/kg)	Chemical	LD₅₀ (with route and animal)
Ethyl alcohol	10 000	Caffeine	620 mg/kg – oral mouse
Sodium chloride	4 000		192 mg/kg – oral rat
			105 mg/kg – i.v. rat
			68 mg/kg – i.v. mouse
Ferrous sulfate	1 500	Chlorine (LC₅₀)	293 ppm/1 h – rat
Ferrous sulfate	1 500	Chlorine (LC₅₀)	137 ppm/1 h – mouse
Morphine sulfate	900	THC (from marijuana)	175 mg/kg – i.v. mouse
Strychnine sulfate	150		155 mg/kg – i.v. rabbit
Strychnine sulfate	150		100 mg/kg – i.v. dog
Nicotine	1	Mercury(I) chloride	210 mg/kg – oral rat
Nicotine	1	Mercury(I) chloride	8 mg/kg – i.v. mouse
Black widow	0.55	Mercury(II) chloride	37 mg/kg – oral rat
Black widow	0.55	Mercury(II) chloride	10 mg/kg – oral mouse
Curare	0.50	Arsenic acid (V oxidation state)	48 mg/kg – oral rat
Rattlesnake	0.24	Arsenic trioxide (III oxidation state)	20 mg/kg – oral rat
Dioxin (TCDD)	0.001	Dimethylarsenic acid (methylated arsenic form used as a cotton defoliant)	700 mg/kg – oral rat

2.3.2.1 Nonlinear Dose–Response Assessment

In nonlinear dose–response assessment, the threshold of toxicity is where the effects (or their precursors) begin to occur. The no observed adverse effect level (NOAEL) is the highest exposure level at which no statistically or biologically significant increases are seen in the frequency or severity of adverse effect between the exposed population and its control population. Different mathematical models were used to establish the bench mark dose (BMD) or benchmark dose lower confidence limit (BMDL) in the range from 1 to 10% depending on toxicity tests. The BMDL is a statistical lower confidence limit on the dose that produces the selected response. The lowest observed adverse effect level (LOAEL), NOAEL, or BMDL is used as the point of departure for extrapolation to lower doses. The EPA (2012) developed a guideline on using dose–response modeling to obtain BMD, i.e. dose levels corresponding to specific response levels, near the low end of the observable effect as shown in Figure 2.9. The fraction of animals affected in each group is indicated by the points with the error bars of 95% confidence intervals. In developing the BMD, the EPA requires the following statistical information of the process: (i) rationale, (ii) estimation procedure, (iii) estimates of model parameters, (iv) goodness of fit such as log‐likelihood and Akaike Information Criterion (AIC), and (v) standardized residuals. The US EPA (2012) provided excellent examples to illustrate some important aspects of computing benchmark doses (BMDs) and BMDLs from simple datasets and endpoints using EPA’s BMDS package.

Fraction affected vs. dose displaying an ascending curve and dots with error bars, with MD and BMDL indicated at the bottom left portion. At the top left is a legend indicating dots for data and line for multistage. — **Figure 2.9** Example of a model fit to dichotomous data, with BMD and BMDL indicated.

Example 2.2

Given: Table 2.8 lists dose–response data with the tumor endpoint of hepatocellular adenomas or carcinomas in a cancer bioassay using B6C3F1 mice exposed by gavage.

Table 2.8 Dose–response data.

NTP (National Toxicology Program) (1988).

Administered dose (mg/kg/day)	Human equivalent dose (mg/kg/day)	Tumor incidence
0	0	5/50
50	2.83	10/49
100	5.67	19/50

Find: The BMDL and BMD of chlorodibromomethane with 95% confidence interval.

Solution:

Although US EPA’s cancer guidelines (US EPA, 2005) emphasize that the choice of benchmark response (BMR) should be independent of the extrapolation method, 10% extra risk is a typical BMR for standard cancer bioassay data when using linear extrapolation from the point of departure (POD). Therefore, a BMR of 10% extra risk was selected, as it was near the low end of the observable range. The one‐sided BMDL was calculated at the 95% confidence level. EPA’s cancer guidelines also recommend reporting an upper bound on the BMD, or a benchmark dose unit (BMDU), to measure uncertainty. Accordingly, the 95% one‐sided BMDU was also estimated. Together the two limits provide a 90% two‐sided confidence interval.

Model fitting (using the BMDS 2.6.0.1 (BMD software). Available from https://www.epa.gov/bmds/download‐benchmark‐dose‐software‐bmds#installing).

First, a 2nd‐degree (i.e. n − 1) multistage model was fitted to the data. The model form is

This model fits all three observations exactly (Table 2.9); hence, the x² goodness‐of‐fit p‐value is undefined and the scaled residuals are all zero. The Akaike information criterion (AIC) was 158.7. The BMD and lower bounds (estimated by likelihood profile) estimates were

The Matlab code to plot the fit to 2nd‐degree multistage model is

clear
clc
x=[0 2.83 5.67];
y=[0.12 0.204081633 0.38];
xBMD=[-0.6 2.91 2.91];
yBMD=[0.208045678 0.208045678 0];
xBMDL=[-0.6 1.25 1.25];
yBMDL=[0.208045678 0.208045678 0];
hold on
x4=[0:0.01:7];
z4=0.12+(1-0.12)*(1-exp(-0.00930036*(x4)  -0.00925286*(x4).^2));
z3=0.12+(1-0.12)*(1-exp(-0.00930036*(x)  -0.00925286*(x).^2));
plot(x,y,'ko','MarkerSize',3,'MarkerFaceColor',[0 0 0],'MarkerEdgeColor',[0 0 0],'linewidth',1);
hold on
plot(x4,z4,'k‐','LineWidth',1);
hold on
plot(xBMD,yBMD,'k‐‐','LineWidth',1);
hold on
plot(xBMDL,yBMDL,'k‐‐','LineWidth',1);
hold on
hold on
text(0.5,0.12,'BMDL','fontsize',11,'FontWeight','bold','FontName', 'Times New Roman')
hold on
text(3,0.12,'BMD','fontsize',11,'FontWeight','bold','FontName', 'Times New Roman')
hold on
R2 = 1 - sum((y - z3).^2)/sum((y - mean(y)).^2)
set(gca,'xlim',[-0.6 6],'FontWeight','bold','FontName', 'Times New Roman');
set(gca,'ylim',[0.1 0.4],'FontWeight','bold','FontName', 'Times New Roman');
xlabel('Human Equivalent Dose (mg/kg/day)','FontSize',14,'FontWeight','bold','FontName', 'Times New Roman')
ylabel({'Tumor Incidence'},'FontSize',14,'FontWeight','bold','FontName', 'Times New Roman')
legend('Data Means','Multistage','Location','NorthWest');
box on
grid on

Next, a 1st‐degree multistage model was fitted to the data to see if a more parsimonious model could also provide an adequate fit (Figure 2.10). The model form is

The 1st‐degree multistage model also fitted the data adequately (see Tables 2.10 and 2.11), with a x² goodness‐of‐fit p‐value of 0.4494 and scaled residuals, shown in Table 2.11, are not unusually large. The AIC was 157.3. The BMD and BMDL were

The Matlab code to fitted 2nd‐degree multistage model is

clear
clc
x=[0 2.83 5.67];
y=[0.12 0.204081633 0.38];
xBMD=[-0.6 1.88 1.88];
yBMD=[0.200245794 0.200245794 0];
xBMDL=[-0.6 1.2 1.2];
yBMDL=[0.200245794 0.200245794 0];
hold on
x4=[0:0.01:7];
z4=0.111488+(1-0.111488)*(1-exp (-0.0559807*(x4)));
z3=0.111488+(1-0.111488)*(1-exp (-0.0559807*(x)));
plot(x,y,'ko','MarkerSize',3,'MarkerFaceColor',[0 0 0],'MarkerEdgeColor',[0 0 0],'linewidth',1);
hold on
plot(x4,z4,'k‐','LineWidth',1);
hold on
plot(xBMD,yBMD,'k‐‐','LineWidth',1);
hold on
plot(xBMDL,yBMDL,'k‐‐','LineWidth',1);
hold on
hold on
text(0.4,0.12,'BMDL','fontsize',11,'FontWeight','bold','FontName', 'Times New Roman')
hold on
text(2,0.12,'BMD','fontsize',11,'FontWeight','bold','FontName', 'Times New Roman')
hold on
R2 = 1 - sum((y - z3).^2)/sum((y - mean(y)).^2)
set(gca,'xlim',[-0.6 6],'FontWeight','bold','FontName', 'Times New Roman');
set(gca,'ylim',[0.1 0.4],'FontWeight','bold','FontName', 'Times New Roman');
xlabel('Human Equivalent Dose (mg/kg/day)','FontSize',14,'FontWeight','bold','FontName', 'Times New Roman')
ylabel({'Tumor Incidence'},'FontSize',14,'FontWeight','bold','FontName', 'Times New Roman')
legend('Data Means','Multistage','Location','NorthWest');
box on
grid on

Answer: AIC can be used to compare models from different families using a similar fitting method (for example, least squares or a binomial maximum likelihood) (Figure 2.11). AIC is lower for the 1st‐degree model suggesting that this is the preferred model. If L presents the log‐likelihood at the maximum likelihood estimates (MLEs) for p estimated parameters, AIC is −2L + 2p (Akaike, 1973; Linhart and Zucchini, 1986; Stone, 1998).

Table 2.9 Parameter estimates with standard errors for 2nd‐degree multistage model.

Parameter	Maximum likelihood estimates (MLEs)	Standard error
Background	0.12	0.132665
Beta1	0.00930036	0.141898
Beta2	0.00925286	0.0246904

Fitted 2nd-degree multistage model and data means represented by an ascending line and dots, respectively, for tumor incidence versus human equivalent dose and with dashed lines indicating BMDL and BMD. — **Figure 2.10** Fitted 2nd‐degree multistage model and data means.

Table 2.10 Parameter estimates with standard errors for 1st‐degree multistage model.

Parameter	Maximum likelihood estimates (MLEs)	Standard error
Background	0.111488	0.111488
Beta1	0.120556	0.120556

Table 2.11 Goodness‐of‐fit table.

Dose	Estimated probability	Expected number responding	Observed number responding	Group size	Scaled residual
0.0000	0.1115	5.574	6	50	0.086
2.8300	0.2417	11.842	10	49	−0.205
5.6700	0.3531	17.657	19	50	0.118

Fitted 1st-degree multistage model and data means represented by an ascending line and dots, respectively, for tumor incidence versus human equivalent dose and with dashed lines indicating BMDL and BMD. — **Figure 2.11** Fitted 1st‐degree multistage model and data means.

2.3.2.2 Linear Dose–Response Assessment

For carcinogens, if “mode of action” information is insufficient, then linear extrapolation is typically used as the default approach for dose–response assessment. A straight line is drawn from the point of departure for the observed data (typically the BMDL) to the origin (where there are zero dose and zero response). The slope of this straight line is referred to as the slope factor (SF) or cancer SF that is used to estimate risk at exposure levels. When linear dose–response is used to assess cancer risk, excess lifetime cancer risk is calculated as follows:

(2.2)

Total cancer risk is calculated by adding the individual cancer risks for each pollutant in each pathway of concern (i.e. inhalation, ingestion, and dermal absorption) by using reasonable maximum exposure (RME) and then adding together the risk for all pathways.

2.3.3 Exposure Assessment

Exposure assessment estimates the magnitude, frequency, and duration of human exposure to an agent in the environment or estimates future exposures for an agent that has not yet been released. Exposed concentration can be measured at the point of contact (the outer boundary of the body), estimated by separately evaluating the exposure concentration and the time through different scenarios, or reconstructed through internal indicators (biomarkers, body burden, and excretion levels) after the exposure. Table 2.12 lists the variables used in the US EPA Guidelines for Exposure Assessment (1992) for HRA.

Table 2.12 Specific parameters in health risk assessment.

Parameter	Definition	Default – child	Default – adult
TRL	Target risk level (unitless)	10⁻⁶	10⁻⁶
BW	Body weight (kg)	15	70
AT	Averaging time (year)	70	70
SFABS	Absorbed cancer slope factor (mg/kg/day)⁻¹	Chemical specific	Chemical specific
ED	Exposure duration (year)	6	30
EV	Event frequency (events/day)	1	1
EF	Exposure frequency (days/year)	350	350
FA	Fraction absorbed (unitless)	Chemical specific	Chemical specific
t_event‐RME	Event duration (h)	1 (bathing)	0.58 (showering)
SA	Surface area (cm²)	6 600	18 000
K_p	Permeability coefficient (cm/h)	Chemical specific	Chemical specific
ABSGI	Absorption fraction (unitless)	Chemical specific	Chemical specific
τ_event	Lag time per event (h)	Chemical specific	Chemical specific
SF_o	Oral cancer slope factor (mg/kg/day)	Chemical specific	Chemical specific
t^*	Time to reach steady state (h)	Chemical specific	Chemical specific
DAD	Dermal absorbed dose (mg/kg/day)	Site specific	Site specific
AD_event	Absorbed dose per event (mg/cm²/event)	Site specific	Site specific
B	Dimensionless ratio of the permeability coefficient of a compound through the stratum corneum relative to its permeability coefficient across the viable epidermis (ve) (dimensionless)	Chemical specific	Chemical specific

The internal dose via the dermal route, µg/kg bw/day, can be calculated as follows:

(2.3)

where

2.3.3.1 Cancer Screening Calculation for Dermal Contaminants in Water

For a given cancer risk level at 10⁻⁶, the following equations can be used to estimate dermal absorbed dose (DAD) in mg/kg/day:

For cancer risk
(2.4)
For hazard quotient
(2.5)
Evaluate AD_event:
(2.6)
Evaluate permissible water concentration, C_w: For organics
(2.7)

(2.8)
For inorganics
(2.9)

Example 2.3 illustrates the steps used to calculate the cleanup level from dermal exposure to compounds in water given an acceptable risk of 10⁻⁶. The default scenarios used in the calculations are (i) the adult 30‐year exposure and (ii) an age‐adjusted 30‐year exposure incorporating a child bathing for 1 h/event (RME value), once a day, 350 days/year for 6 years and an adult showering at 35 min/event (RME value), once a day, 350 days/year for 24 years. The general equations could be applied to any compound, and the example gives the calculation for one compound in water with a cancer risk of 10⁻⁶.

2.3.3.2 Noncancer Screening Calculation for Contaminants in Residential Soil

The following equations are provided by the EPA in the exposure assessment process. The scenario to be evaluated is residential soil. Equations (2.10), (2.11), and (2.12) are used for calculating the soil concentration, C_soil:

Child or adult:

(2.10)

Age adjusted:

(2.11)

The age‐adjusted, body‐part‐weighted dermal factor is as presented in equation

(2.12)

For toxicity assessment, cancer SF can be derived based on absorbed dose:

(2.13)

while RfD can be expressed as follows based on absorbed dose:

(2.14)

Example 2.4

Exposure risk assessment of noncarcinogen cadmium.

Given: Cadmium has both an oral reference dose (RfD) and ABS_d to allow for a quantitative evaluation. ABS_d = 0.001, cadmium in soil, and a level of concern equal to a hazard index of 1. All the other parameters are listed in Table 2.13.

Table 2.13 Specific parameters in health risk assessment.

Parameter	Definition	Default – child	Default – adult	Default – age adjusted
THQ	Target hazard quotient (unitless)	1	1	1
BW	Body weight (kg)	15	70	—
AT	Averaging time (year)	6	30	30
RfD	Reference dose (mg/kg/day)	Chemical specific	Chemical specific	Chemical specific
ED	Exposure duration (year)	6	30	—
EV	Event frequency (events/day)	1	1	1
EF	Exposure frequency (days/year)	350	350	350
SA	Surface area (cm²)	2800	5700	—
AF	Adherence factor (mg/cm²/event)	0.2	0.07	—
ABS	Absorption fraction (unitless)	Chemical specific	Chemical specific	Chemical specific
SFS_adj	Age‐adjusted dermal factor	—	—	360

Find: The soil concentration C_soil using child, adult, and age‐adjusted scenarios to estimate exposure risk to noncarcinogen cadmium.

Solution:

To determine the dermal RfD, it suggests that the gastrointestinal (GI) adjustment for cadmium is either 5% for water or, more applicable for this example, 2.5% from food. Therefore, the dermal RfD is (mg/kg/day) using Equation (2.14), the oral RfD of from food, and a GI absorption of 2.5%. The oral RfD is calculated as follows:

Child:
Adult:
Age adjusted

2.3.4 DBP Health Advisory Concentration

Many benchmark concentrations could be correlated with E_LUMO, while E_LUMO significantly correlates with the number of chlorine for a given class DBP with a specific carbon number (Tang and Wang, 2010). Example 2.5 illustrates such correlation.

Example 2.5

Given: The number of chlorines, E_LUMO, and longer‐term exposure for a 70 kg adult of chlorinated 1‐carbon halogenated aliphatic compounds are shown in Table 2.14 (Jiang et al., 2000; Yang et al., 2000).

Table 2.14 Number of chlorines, E_LUMO, and longer‐term exposure for 70 kg adult of chlorinated 1‐carbon halogenated aliphatic compounds.

	Number of chlorines	E_LUMO	Longer‐term exposure for 70 kg adult (mg/l)
Carbon tetrachloride	4	0.09465	0.3
Chloroform	3	0.12650	0.4
Dichloromethane	2	0.1662
Chloromethane	1	0.21662	1

Find:

Regression between E_LUMO and the number of chlorines for 1‐carbon halogenated aliphatic compounds by using Matlab.
The longer‐term exposure concentration for 70 kg adult versus E_LUMO for 1‐carbon halogenated aliphatic compounds by using Matlab.
How the number of chlorine in 1‐carbon halogenated aliphatic compounds affects the longer‐term exposure concentration for a 70 kg adult.

Solution:

Regression between E_LUMO and the number of chlorines for 1‐carbon halogenated aliphatic compounds is conducted using the following Matlab code:

clear
clc
NumberofChlorines=[4 3 2 1];
ELUMO=[0.09465 0.12650 0.1662 0.21662];
hold on
n=1;
p=polyfit(NumberofChlorines, ELUMO, n)
x4=[0:0.01:10];
z3=polyval(p, NumberofChlorines);
z4=polyval(p, x4);
plot(NumberofChlorines,ELUMO,'ko','MarkerSize',3,'MarkerFaceColor',[0 0 0],'MarkerEdgeColor',[0 0 0],'linewidth',1);
hold on
plot(x4,z4,'k‐','LineWidth',1);
hold on
R2 = 1 - sum((ELUMO - z3).^2)/sum((ELUMO - mean(ELUMO)).^2)
set(gca,'xlim',[0.5 4.5],'FontWeight','bold','FontName', 'Times New Roman');
xlabel('Number of Chlorines','FontSize',14,'FontWeight','bold','FontWeight','bold','FontName', 'Times New Roman')
ylabel({'E_L_U_M_O'},'FontSize',14,'FontWeight','bold')
legend('OriginalData','Trendline','Location','NorthEast','FontWeight','bold','FontName', 'Times New Roman');
box on
grid on

Figure 2.12 shows that E_LUMO has close correlation and decreased with the number of chlorine atoms according to the following equation:

For chlorinated 1‐carbon halogenated aliphatic compounds, E_LUMO is shown in Figure 2.13.

ELUMO versus number of chlorines for 1-carbon halogenated aliphatic compounds displaying a descending line (trend line) along with 4 dots (original data). — **Figure 2.12** E_LUMO versus number of chlorines for 1‐carbon halogenated aliphatic compounds.

Longer-term exposure for 70 kg adult versus ELUMO for 1-carbon halogenated aliphatic compounds displaying a descending line (trend line) along with 3 dots (original data). — **Figure 2.13** Longer‐term exposure for 70 kg adult versus E_LUMO for 1‐carbon halogenated aliphatic compounds.

Regression between longer‐term exposure dose for a 70 kg adult and E_LUMO for 1‐carbon halogenated aliphatic compounds:

Longer‐term exposure dose for a 70 kg adult is closely related with E_LUMO:

where D_{LTED, 70} is the longer‐term exposure dose for a 70 kg adult and E_LUMO is the energy of the lowest unoccupied molecular orbital.

clear
clc
LongerTermExposurefor70kgAdult=[0.3 0.4 1];
ELUMO=[0.09465 0.12650 0.21662];
hold on
n=1;
p=polyfit(ELUMO,LongerTermExposurefor70kgAdult,n)
x4=[0:0.01:10];
z3=polyval(p,ELUMO);
z4=polyval(p,x4);
plot(ELUMO,LongerTermExposurefor70kgAdult,'ko','MarkerSize',3,'MarkerFaceColor',[0 0 0],'MarkerEdgeColor',[0 0 0],'linewidth',1);
hold on
plot(x4,z4,'k‐','LineWidth',1);
hold on
R2 = 1 - sum((LongerTermExposurefor70kgAdult - z3).^2)/sum((LongerTermExposurefor70kgAdult - mean(LongerTermExposurefor70kgAdult)).^2)
set(gca,'xlim',[0.075 0.225],'FontWeight','bold','FontName', 'Times New Roman');
xlabel('E_L_U_M_O','FontSize',14,'FontWeight','bold','FontWeight','bold','FontName', 'Times New Roman')
ylabel({'Longer Term Exposure';'for 70 kg Adult (mg/l)'},'FontSize',14,'FontWeight','bold')
legend('Original Data','Trendline', 'Location','NorthWest','FontWeight','bold','FontName', 'Times New Roman');
box on
grid on

For chlorinated 1‐carbon halogenated aliphatic compounds, the longer‐term exposure for a 70 kg adult increased with E_LUMO as shown in Figure 2.13
The slope, k₁, is 5.94 between the longer‐term exposure dose for a 70 kg adult and E_LUMO for 1‐carbon halogenated aliphatic compounds. The slope k₂ for E_LUMO versus the number of chlorines for 1‐carbon halogenated aliphatic compounds is −0.0406.

Comments: For chlorinated 1‐carbon halogenated aliphatic compounds, the corresponding longer‐term exposure concentration for a 70 kg adult decreased with the number of chlorine atoms.

2.3.5 Risk Characterizations

Risk characterization conveys the risk assessor’s judgment as to the nature and presence or absence of risks. How the risk was assessed should present where assumptions and uncertainties exist and where policy choices will need to be made. EPA recommends that the characterization fully and explicitly disclose the risk assessment methods, default assumptions, logic, rationale, extrapolations, uncertainties, and overall strength of each step in the assessment. After risk assessment, risk management and communication are key to protecting public health. Each component of the risk assessment (e.g. hazard assessment, dose–response assessment, exposure assessment) has an individual risk characterization for the key findings, assumptions, limitations, and uncertainties. Risk characterization also applies to both human HRA and ecological risk assessments.

2.4 QSAR Analysis in HRA

The US EPA permits and uses quantitative structure and activity relationship (QSAR) principles to classify and prioritize DBPs because more than 600 DBPs have been identified and cataloged by the US EPA but only small fraction of them have been studied on toxicity quantification. QSAR can be used to predict the toxicity of a specific DBP or molecular property of DBP such as log P, which reflects hydrophobicity of a chemical compound. Experts use the principles of mechanism‐based structure and activity relationship (SAR) relative to known carcinogens, and mechanisms include structural analogy to known carcinogens, toxicokinetic and toxicodynamic factors, potency indicators for a structural analog, short‐term test data, and metabolic activation. Therefore, QSAR analysis has its unique role in setting drinking water standards.

QSAR analysis is based upon a simple fact that similar chemical structures are expected to exhibit similar chemical behavior. With tens of thousands of chemical structures to assess and many different empirical tests, QSAR is often used in strategic screening in product development through HI and risk assessment. Given sufficient knowledge on structurally or functionally related compounds, QSAR may be used for screening well‐defined biological, toxicological, or pharmacological endpoints of interest and associated kinetic characteristics such as absorption, distribution, metabolism, and excretion. In QSAR analysis, chemical structure is quantitatively correlated with a well‐defined process, such as biological activity, chemical reactivity, or toxicity of a chemical compound. The US EPA uses QSAR analysis to prioritize DBPs using three general types of predictive models:

Mathematical models such as SARs and QSARs use descriptors and mathematical relationships to derive predictions. Examples of SAR/QSAR models are ECOSAR and select modules contained in EPISuite™ like WSK_OWWIN.
Fragment‐based models such as the BioWin module within EPISuite evaluate the features of molecular fragments present on the molecule to make predictions.
Expert systems, like OncoLogic™, use rule‐based decision trees to mimic an expert’s judgment. Other expert systems utilize artificial neural networks and molecular models.

There are hundreds of chemicals that have been identified as DBPs, a large proportion of which fall into the general category of halogenated DBPs. Due to evidence of carcinogenicity and developmental effects, halogenated DBPs are of particular interest. Two major molecular descriptors can be used in predicting the toxicity of DBP. One is log P and the other is the number of carbon and halogen atoms. For QSAR analysis, halogenated DBPs are classified into eight classes as illustrated in Figure 2.14.

Structural formulas of halogenated alkane, halogenated alkene, halogenated aromatic, halogenated aldehyde, halogenated ketone, halogenated carboxylic acid, heterocycle, and a DBP compound. — **Figure 2.14** Classification of DBPs.

According to the molecular structure of DBPs, Pierotti (1999) developed a DBP database containing more than one thousand DBP compounds for the eight classes of DBPs. The database includes more than 20 different physical and chemical characteristics for each DBP compound. Major variables are chemical name, address, subclass, CAS number, SMILES, disinfection process, molecular formula and structure, molecular weight, log P, cLog P, log P reference, melting point (°C), boiling point (°C), vapor pressure (mm of Hg), solubility (mol/l), E_HOMO, E_LUMO, surface area, MCLG (mg/l), MCL (mg/l), 10‐day exposure for a 10 kg child (mg/l), long‐term exposure for 10 kg child (mg/l), long‐term exposure for a 70 kg adult (mg/l), mg/l at 10⁻⁴ cancer risk, cancer group, SDWA reference, and toxicological effect. The database is used in developing QSAR models in the following section to illustrate statistical methods such as regression, outlier detection, quantification of uncertainty and sensitivity, and validation of QSAR models.

2.4.1 Multiple Linear Regression (MLR)

Multiple linear regression (MLR) is a common algorithm in deriving QSAR models. It relates the dependent variable y to a number of independent (predictor) variables, x_j, by using a linear equation as follows:

(2.15)

where

ŷ = calculated dependent variable
x_j = predictor variable
b_j = regression coefficient

To assess goodness of fit quantified by the correlation coefficient of multiple determination, (R²) is calculated by Equation (2.17). R² estimates the proportion of the variation of y that is explained by regression equation (Massart et al., 1997). If there is a perfect fit, R² = 1, while if there is no linear relationship between the dependent and independent variables, R² = 0:

(2.16)

where

In predicting the toxicity of a chemical compound, two important molecular descriptors are hydrophobicity such as log P and electronic properties such as the energy of the lowest unoccupied molecular orbitals (E_LUMO). Log P reflects the hydrophobicity of molecules, which often correlates well with the bioactivity of chemicals (Leo and Hansch, 1999). The logarithm of the partition coefficient (log P), also referred to as log K_ow, describes the distribution of a compound between organic (usually n‐octanol) and water phases by the following equation:

(2.17)

where

If log P is greater than zero (0), the compound has a greater solubility in the organic phase; if log P is less than zero (0), the compound has a greater solubility in the aqueous phase. Chen et al. (2016) reported that log P can be predicted accurately by the E_LUMO, the number of carbon (N_C), and the number of chlorine (N_Cl). A general MLR model is to predict log P is

(2.18)

To develop a robust QSAR model, outliers have to be detected to improve the correlation coefficiency of the QSAR model. The Hotelling test and the associated leverage statistics can be used to detect outliers. The leverage of a chemical provides a measure of the distance of the chemical from the centroid of its active set. Chemicals in the active set have leverage values between 0 and 1. A warning leverage (h^*), defined in Equation (2.19) (Eriksson et al., 2003), is a critical value to cut off the outliers from the dataset:

(2.19)

where

The William plot has a double‐ordinate Cartesian plot of cross‐validation residuals (first ordinate), standard residuals (second ordinate), and leverage (Hat diagonal: abscissa) values (h), which defined the domain of applicability of the model as a squared area within ±2 band for residuals and a leverage threshold, h^*.

Example 2.6

Given: Table 2.15 lists the halogenated alkane compounds selected for QSAR model development. The table consists of the list number, the chemical name, and molecular formula. The log P values were obtained from Hansch’s QSAR database (Hansch et al., 1995). Molecular descriptor, E_LUMO, was calculated using the latest edition of Spartan software. N_Cl and N_C are presented in Table 2.15 with reference to their molecular structures.

Table 2.15 Halogenated alkane compounds used in QSAR analysis.

List no.	DBP chemical name	Molecular formula	log P	E_LUMO	N_Cl	N_C
1	Bromochloromethane	CH₂BrCl	1.39	0.14	1	1
2	1‐Chlorobutane	C₄H₉Cl	2.52	0.22	1	4
3	1‐Chlorohexane	C₆H₁₃Cl	3.58	0.22	1	6
4	1‐Bromo‐3‐chloropropane	C₃H₆BrCl	1.85	0.16	1	3
5	2‐Bromo‐2‐methylpropane	C₄H₉Br	2.53	0.17	1	4
6	2‐Chlorobutane	C₄H₉Cl	2.52	0.21	1	4
7	1‐Bromopentane	C₅H₁₁Br	3.19	0.19	1	5
8	2‐Chloro‐2‐methylbutane	C₅H₁₁Cl	2.92	0.20	1	5
9	1‐Chloroheptane	C₇H₁₅Cl	4.11	0.22	1	7
10	Chloromethane	CH₃Cl	0.94	0.22	1	1
11	1‐Chloropropane	C₃H₇Cl	1.99	0.21	1	3
12	2‐Chloropropane	C₃H₇Cl	1.99	0.21	1	3
13	1‐Chloropentane	C₅H₁₁Cl	3.05	0.22	1	5
14	1‐Chlorooctane	C₈H₁₇Cl	4.64	0.22	1	8
15	1‐Chlorodecane	C₁₀H₂₁C	5.7	0.22	1	10
16	2‐Chlorohexane	C₆H₁₃Cl	3.581	0.21	1	6
17	3‐Chlorohexane	C₆H1₃Cl	3.581	0.22	1	6
18	Chloroethane	C₂H₅Cl	1.47	0.22	1	2
19	Bromochloroiodomethane	CHBrClI	2.62	0.08	1	1
20	Chlorotribromomethane	CBr₃Cl	3.295	0.07	1	1
21	1,2‐Dichloroethane	C₂H₄Cl₂	1.46	0.19	2	2
22	1,2‐Dichloropropane	C₃H₆Cl₂	1.99	0.19	2	3
23	Bromochloromethane	CH₂BrCl	1.39	0.14	1	1
24	1,5‐Dichloropentane	C₅H₁₀Cl₂	2.77	0.20	2	5
25	Dichloromethane	CH₂Cl₂	1.25	0.17	2	1
26	trans‐1,2‐Dichlorocyclohexane	C₆H₁₀Cl₂	3.3	0.18	2	6
27	1,1‐Dichloroethane	C₂H₄Cl₂	1.78	0.17	2	2
28	1,2‐Dichlorobutane	C₄H₈Cl₂	2.52	0.19	2	4
29	2,2‐Dichloropropane	C₃H₆Cl₂	2.31	0.17	2	3
30	Dichlorodifluoromethane	CCl₂F₂	2	0.10	2	1
31	Dichloroiodomethane	CHCl₂I	2.482	0.08	2	1
32	2,3‐Dichlorobutane	C₄H₈Cl₂	2.52	0.18	2	4
33	1,2‐Dichloro‐2‐methylbutane	C₅H₁₀Cl₂	2.91	0.18	2	5
34	1,1,1‐Trichloroethane	C₂H₃Cl₃	2.48	0.13	3	2
35	1,1,2‐Trichloroethane	C₂H₃Cl₃	2.05	0.16	3	2
36	1,2,3‐Trichloropropane	C₃H₅Cl₃	1.98	0.16	3	3
37	Fluorotrichloromethane	CCl₃F	2.44	0.09	3	1
38	1,1,2‐Trichloropropane	C₃H₅Cl₃	2.58	0.16	3	3
39	1,1,3‐Trichloropropane	C₃H₅Cl₃	1.98	0.16	3	3
40	1,2,2‐Trichloropropane	C₃H₅Cl₃	2.58	0.15	3	3
41	1,1,1‐Trichloropropane	C₃H₅Cl₃	3.01	0.13	3	3
42	1,1,2,2‐Tetrachloroethane	C₂H₂Cl₄	2.64	0.15	4	2
43	Carbon tetrachloride	CCl₄	2.88	0.09	4	1
44	1,1,1,2‐Tetrachloroethane	C₂H₂Cl₄	3.03	0.26	4	2
45	Pentachloroethane	C₂HCl₅	3.63	0.12	5	2
46	Hexachloroethane	C₂Cl₆	4.61	0.11	6	2

Find: QSAR model to predict log P of DBPs of halogenated alkanes using the LMR method.

Solution:

Outlier detection by leverage approach

Statistical software SAS can be used to analyze the DBP datasets. Statistical parameters such as R², adjusted R², mean square error (MSE), and F‐value are calculated by SAS. Analyzing the model AD in the William plot using the data in Table 2.16, four DBP compounds are identified as outliers as labeled in Figure 2.15. Table 2.15 shows that the compounds are 1‐bromo‐3‐chloropropane (4), chlorotribromomethane (20), 1,1,1,2‐tetrachloroethane (44), and hexachloroethane (46). These compounds have a high leverage that are greater than h^* value, 0.26087. Compared with other DBP compounds in the dataset, these molecules may have some structural anomalies that are not well modeled by the selected descriptors for the outliers or that are too particular for the influential chemicals. For example, 1‐bromo‐3‐chloropropane (4) and chlorotribromomethane (20) consist of bromide atom in their molecular structure, whereas 1,1,1,2‐tetrachloroethane (44) and hexachloroethane (46) have a high number of chlorine in the molecular structure.

Table 2.16 Leverage values of halogenated alkanes.

List no.	Standardized residuals	Leverage h	Leverage threshold h^*	Cutoff value (−)	Cutoff value (+)
1	−0.21192	0.09885	0.26087	−1.93218	1.93218
2	0.32864	0.17935	0.26087	−1.93218	1.93218
3	1.27523	0.18739	0.26087	−1.93218	1.93218
4	2.51973	0.21456	0.26087	−1.93218	1.93218
5	0.39921	0.1208	0.26087	−1.93218	1.93218
6	−0.96939	0.04923	0.26087	−1.93218	1.93218
7	0.26173	0.06586	0.26087	−1.93218	1.93218
8	0.26173	0.06586	0.26087	−1.93218	1.93218
9	0.51915	0.05641	0.26087	−1.93218	1.93218
10	−0.39389	0.04334	0.26087	−1.93218	1.93218
11	0.3323	0.04671	0.26087	−1.93218	1.93218
12	0.32604	0.04487	0.26087	−1.93218	1.93218
13	−0.05965	0.04323	0.26087	−1.93218	1.93218
14	0.58972	0.05059	0.26087	−1.93218	1.93218
15	0.66029	0.06233	0.26087	−1.93218	1.93218
16	0.47556	0.06114	0.26087	−1.93218	1.93218
17	0.66241	0.06233	0.26087	−1.93218	1.93218
18	0.73086	0.09166	0.26087	−1.93218	1.93218
19	0.80143	0.13855	0.26087	−1.93218	1.93218
20	0.94257	0.28508	0.26087	−1.93218	1.93218
21	−0.53812	0.06966	0.26087	−1.93218	1.93218
22	−0.25567	0.0907	0.26087	−1.93218	1.93218
23	0.39272	0.12862	0.26087	−1.93218	1.93218
24	−0.77241	0.05482	0.26087	−1.93218	1.93218
25	−0.46755	0.03743	0.26087	−1.93218	1.93218
26	−0.70184	0.03166	0.26087	−1.93218	1.93218
27	−0.39698	0.02278	0.26087	−1.93218	1.93218
28	−1.03817	0.03157	0.26087	−1.93218	1.93218
29	−0.63127	0.02607	0.26087	−1.93218	1.93218
30	−0.81812	0.02411	0.26087	−1.93218	1.93218
31	−0.96760	0.0393	0.26087	−1.93218	1.93218
32	−1.04442	0.04036	0.26087	−1.93218	1.93218
33	−1.27073	0.07417	0.26087	−1.93218	1.93218
34	−0.09936	0.09095	0.26087	−1.93218	1.93218
35	−0.32045	0.04486	0.26087	−1.93218	1.93218
36	−0.67173	0.04238	0.26087	−1.93218	1.93218
37	−1.87347	0.03703	0.26087	−1.93218	1.93218
38	−0.60116	0.03703	0.26087	−1.93218	1.93218
39	−1.87347	0.03703	0.26087	−1.93218	1.93218
40	−0.78801	0.03857	0.26087	−1.93218	1.93218
41	−0.24988	0.05228	0.26087	−1.93218	1.93218
42	0.24379	0.11295	0.26087	−1.93218	1.93218
43	−0.19734	0.08404	0.26087	−1.93218	1.93218
44	2.68499	0.34074	0.26087	−1.93218	1.93218
45	0.75155	0.16316	0.26087	−1.93218	1.93218
46	2.05293	0.27957	0.26087	−1.93218	1.93218

Williams plot for detecting outliers in halogenated alkane with h* = 0.26087. — **Figure 2.15** Williams plot for detecting outliers in halogenated alkane.

(*Source:* Reproduced with permission of Springer.)

Multivariable linear regression
After four outliers were removed, QSAR model was developed with correlation coefficient R² of 0.891 as follows:

(2.20)

(n = 42, R² = 0.891, RMSE = 0.310, PRESS RMSE = 0.385, F = 103.936)

Statistical results for halogenated alkanes after removal are summarized in Table 2.17. It shows that the best model requires three variables including E_LUMO, N_Cl, and N_C after the removal of the outliers. All the statistical indicators are improved for the QASR model of predicting log P of halogenated alkanes.

Table 2.17 Statistics of QSAR models for log P of halogenated alkane after removal of outliers.

No. of variables	Variables	MSE	R²	Adjusted R²	Mallows’ C_p	Akaike’s AIC	Schwarz’s SBC
1	N_C	0.280	0.667	0.642	78.506	−51.508	−48.033
2	E_LUMO/N_C	0.101	0.883	0.874	4.838	−93.538	−88.324
3	E_LUMO/N_Cl/N_C	0.096	0.891	0.883	4.000	−94.563	−87.612

Figure 2.16 is a box plot that shows the contribution of each variable such as E_LUMO, N_Cl, and N_C to the standardized coefficients after the outliers were removed. It suggests that N_C contributes to log P most positively, while E_LUMO contributes to log P negatively to a lesser extent. The number of Cl and N_Cl contributes to log P positively but less than both N_C and E_LUMO.

Standardized coefficients vs. variable displaying boxes with error bars labeled E(LUMO), –0.558; NCI, 0.113; and NC, 1.236 depicting each variable contribution to log P in halogenated alkane. — **Figure 2.16** Box plot of each variable contribution to log P in halogenated alkane after removal of outliers.

(*Source:* Reproduced with permission of Springer.)

Figure 2.17 shows that all the predicted log P values are within the boundary of 95% confidence of the observed log P values.

Observed log P vs. predicted log P displaying circle markers along an ascending solid line from the origin between two ascending dotted lines. — **Figure 2.17** Measured log P versus predicted log P for halogenated alkane after removal of outliers.

(*Source:* Reproduced with permission of Springer.)

2.4.2 Validation of QSAR Models

QSAR models should be balanced between the two extremes of overfitted versus underfitted through model validation. Model validation consists of internal model performance (goodness of fit and robustness) and external model performance (predictivity). Cross‐validation, bootstrapping, response randomization test, and training/test set splitting are recommended by the OECD (2007). In cross‐validation, a number of modified datasets are deleted. In each case, one or a small group of compounds from the data are processed in such a way that each object is removed one at a time. From the original dataset, a reduced dataset (training set or active set) is used to develop a partial model, while the remaining data (validation set) are used to evaluate the model predictivity (Efron, 1983; Osten, 1988).

The leave‐one‐out (LOO) method is the simplest cross‐validation procedure. Each compound is removed one at a time. For given n compounds, n‐reduced models are developed. Each of these models is developed with the remaining n − 1 compounds and used to predict the response of the deleted compound. The predictive power of the model is calculated as the sum of squared differences between the observed and estimated responses. This LOO method is accurate to quantify each compound’s impact on the developed QSAR model. However, the predictive power is often too optimistic, particularly with a large dataset compound because the perturbation of one compound is often insignificant.

The leave‐many‐out (LMO) cross‐validation method is to remove more than one compound each time. The dataset is divided into a number of blocks (referred to as cancelation groups). At each time, all the compounds belonging to a block are left out from the derivation of the model. Compared with the LOO method, the LMO method gives a more realistic estimation of predictive power because it introduces a larger perturbation in the dataset. There is no standard rule for splitting the data for a block, so it is normally defined by model users. The random clustering method in which one or more compounds are randomly selected as a subset of compounds in the cancelation block and left out for the deviation of the QSAR model can also be used.

Example 2.7

Given: The developed QSAR model of Equation (2.20) is as follows:

(2.20)

(n = 42, R² = 0.891, RMSE = 0.310, PRESS RMSE = 0.385, F = 103.936)

Find: Please validate the QSAR model of Equation 2.20 using the leave‐one‐out (LOO) and leave‐many‐out (LMO) cross‐validation methods, respectively.

Solution:

The whole dataset is divided so that it has approximately the same intervals of log P values for the active and validation sets, with approximately 20% of the initial compounds set aside for external validation. The active set and validation set in each DBP class are split based on the available compound data after the outliers are removed and randomly grouped into validation sets by computer. The validation sample set for HA includes 37 active sets (n_c) and 5 validation sets (n_v). The QSAR model will be first validated by the LOO method, followed by the LMO method. The results obtained by the LOO and LMO methods are then compared.
LOO Cross‐validation
Each DBP compound in the halogenated alkane class is singled out in turn as a tested or validated DBP, and the MLR regression is repeated for each of the remaining DBP compounds. The results of the LOO method are illustrated in Figure 2.18. The numbers 1, 2, 3, …, 45, 46 represent the sequence numbers corresponding to the studied compounds as listed in Table 2.15. From the radar graph of Figure 2.18, the values of R² for the LOO validation test of the QSAR model are between 0.85 and 0.92, which shows the model’s robustness. The DBP compound 15, 1‐chlorodecane, is closer to the radar center because its R² value is relatively lower. Therefore, the removal of 1‐chlorodecane would significantly decrease the R², so it is an important data for this QSAR model. The average of the calculated square of the cross‐validation coefficient is 0.891, indicating both the robustness and high predictive ability of the model.

Figure 2.18 Radar graph of R² for halogenated alkane by LOO CV.

(Source: Reproduced with permission of Springer.)
LMO Cross‐validation
The dataset of 42 sampling halogenated alkane chemicals are randomly split by computer, resulting in 37 compounds in the active set and 5 compounds in the validation set. Similar to the QSAR model development, an equation of MLR for predicting log P is shown as follows:

(2.21)

(n = 42, R² = 0.898, MSE = 0.101, PRESS RMSE = 0.370)

Figure 2.19 presents the validation results by a plot of the observed log P values versus predicted log P values for the active and validation set. All five validation datasets fall within the boundary of 95% confidence interval, thus indicating that the model is very robust and has high predictive power of log P for halogenated alkanes.

Figure 2.19 Predicated log P versus measured log P for halogenated alkane validation.

(Source: Reproduced with permission of Springer.)

Answer: QSAR models of log P in relation to E_LUMO, N_Cl, and N_C for halogenated alkanes as one class of DBPs were developed and validated.

Comments: Statistic validation methods can provide information about how robust the developed QSAR models are. However, the ultimate validation method should be validation by external data that were not used during the developing the QSAR models.

Example 2.8

Given: The leave‐one‐out (LOO) and leave‐many‐out (LMO) cross‐validation (CV) methods were used to assess the goodness of fit, robustness, and predictive ability of QSAR models for the DBP classes. Six classes of DBPS include halogenated alkane, halogenated alkene, halogenated aromatic, halogenated aldehyde, halogenated ketone, and halogenated carboxylic acid. The key statistical results from the two methods are compared in Table 2.18.

Table 2.18 Statistical parameters of LOO and LMO CV for QSAR models.

Group no.	DBP class	R²			MSE
Group no.	DBP class	LOO	LMO	Diff (%)	LOO	LMO	Diff (%)
1	Halogenated alkane	0.891	0.903	1.35	0.096	0.095	−1.04
2	Halogenated alkene	0.948	0.951	0.32	0.100	0.119	19.00
3	Halogenated aromatic	0.819	0.830	1.28	0.144	0.142	−1.39
4	Halogenated aldehyde	0.952	0.951	−0.11	0.091	0.099	8.79
5	Halogenated ketone	0.976	0.980	0.41	0.044	0.041	−5.91
6	Halogenated carboxylic acid	0.871	0.947	8.70	0.090	0.046	−48.67

Find: The robustness of QSAR models for six DBP classes.

Solution:

The molecular descriptors E_LUMO, N_Cl, and N_C are appropriate for developing log P QSAR models because the values of correlation R² are all above 80%. For halogenated alkanes, each molecular descriptor contributes differently to log P. Both N_Cl and N_C have a positive correlation with log P, while E_LUMO has a negative correlation with log P for halogenated alkane. Both CV methods have produced similar model results. However, as shown in Figure 2.20, the LMO method has a higher R² and a lower MSE value than the LOO method.

Figure 2.20 Comparison of R² obtained by LOO and LMO CV.

(Source: Reproduced with permission of Springer.)

Comments: The LMO CV method shows higher percentage of R² than LOO CV method. For halogenated carboxylic acid, the MSE difference between LOO and LMO CV method is as high as −48.67%.

2.5 Quantification of Uncertainty

All the models have intrinsic uncertainty that has to be quantified to communicate to the public about risks. Constraints, uncertainties, and assumptions having an impact on the risk assessment should be explicitly considered at each step in the risk assessment. Therefore, the quantitative description of uncertainty in HRA, carrying capacity, and climate change are critical for policy makers and the general public. For example, the IPCC Fifth Assessment (2015) adopted the following technical terms in dealing with uncertainty due to the uncertainties in quantifying all consequences of different emissions:

Confidence in the validity of a finding, based on the type, amount, quality, and consistency of evidence (e.g. mechanistic understanding, theory, data, models, expert judgment) and the degree of agreement. Confidence is expressed qualitatively.
Quantified measures of uncertainty in a finding expressed probabilistically (based on statistical analysis of observations or model results or expert judgment).

For the policy makers, the IPCC adopted the following terms for quantitative description purposes to express the assessed likelihood of an outcome or a result. Seven different descriptive terms are assigned to seven different likelihood probability ranges as in Table 2.19.

Table 2.19 Technical terms used for describing the probability of an outcome (IPCC, 2013).

Term	Likelihood probability of an outcome (%)
Virtually certain	99–100
Very likely	90–100
Likely	66–100
About as likely as not	33–66
Unlikely	0–33
Very unlikely	0–10
Exceptionally unlikely	0–1

To quantify different uncertainty, the US EPA identifies scenario, parameter, and model uncertainty. In each category of uncertainty, the sources of uncertainty are identified with a specific example in Table 2.20.

Table 2.20 Type of uncertainty with sources and examples (US EPA).

Type of uncertainty	Sources	Examples
Scenario uncertainty	Descriptive errors	Incorrect or insufficient information
	Aggregation errors	Spatial or temporal approximations
	Judgment errors	Selection of an incorrect model
	Incomplete analysis	Overlooking an important pathway
Parameter uncertainty	Measurement errors	Imprecise or biased measurements
	Sampling errors	Small or unrepresentative samples
	Variability	In time, space, or activities
	Surrogate data	Structurally related chemicals
Model uncertainty	Relationship errors	Incorrect inference on the basis for correlations
Model uncertainty	Modeling errors	Excluding relevant variables

Furthermore, the US EPA lists different quantification methods of uncertainty with examples in Table 2.21.

Table 2.21 Approaches to quantitative analysis of uncertainty.

Approach	Description	Examples
Sensitivity analysis	Changing one input variable at a time while leaving others constant, to examine effect on output	Fix each input at lower (then upper) boundary while holding others at nominal values (e.g. medians)
Analytical uncertainty propagation	Examining how uncertainty in individual parameters affects the overall uncertainty of the exposure assessment	Analytically or numerically obtain a partial derivative of the exposure equation with respect to each input parameter
Probabilistic uncertainty analysis	Varying each of the input variables over various values of their respective probability distributions	Assign probability density function to each parameter; randomly sample values from each distribution and insert them in the exposure equation (Monte Carlo)
Classical statistical methods	Estimating the population exposure distribution directly, based on measured values from a representative sample	Compute confidence interval estimates for various percentiles of the exposure distribution

2.5.1 Quantification of QSAR Model’s Uncertainty

For a QSAR model in which r is function of j variables X₁, X₂, …, X_j:

(2.22)

The uncertainty due to errors in variable can be expressed as follows (Coleman and Steele, 1989):

(2.23)

(2.24)

When the above uncertainty equation is applied to log P as function r, while E_LUMO, N_Cl, and N_C are the three variables, the uncertainty equation of log P can be expressed in Equations 2.25 and 2.26:

(2.25)

(2.26)

When the data reduction expression is very complex and the task of computing the partial derivatives in the above equations is extremely laborious, Monte Carlo simulation (MCS) that is the most efficient quantification of uncertainty according to distribution of independent variables can be used (Cox et al., 2001). MCS is used as an example to quantify the uncertainty of the predicted log P using E_LUMO, N_Cl, and N_C as three independent variables.

2.5.2 Monte Carlo Simulation

Monte Carlo Simulation (MCS) has been extensively used for quantifying uncertainty of linear equations (Tang et al., 2009). Estimating propagation of error distributions by MCS is based on theoretical principles and supports a fully consistent and transferable estimation of measurement uncertainty. In general case, a linear model equation is to measure output Z indirectly obtained from the input variables X₁, X₂, …, X_n by a functional relationship F:

(2.27)

The knowledge about the values that may be reasonably attributed to quantities X_i, considered as continuous random variables, is expressed by their probability distribution function (PDF), , within the corresponding domain. An expectation or best estimate for the value of X_i E(X_i) and the uncertainty associated with this value, μ(X_i), assimilated to the standard deviation σ(X_i), are obtained from the PDF:

(2.28)

(2.29)

If the equation is linearized by means of a Taylor expansion about the point, the estimated uncertainty of the measured output μ(Z) is from the input μ = (μ₁, μ₂, …, μ_n) while the second‐ and higher‐order terms are neglected, the estimated uncertainty of μ(Z) from the input variables is calculated as follows:

(2.30)

Taking μ(Z) as the values F(μ) = F(μ₁, μ₂, …, μ_n) leads to the well‐known law of propagation of uncertainty:

(2.31)

In the Guide to the Expression of Uncertainty in Measurement (GUM, 1993), which is the internationally accepted master document for the evaluation of uncertainty, the combined standard uncertainty μ(Z) is evaluated from the standard uncertainty of the input variables, μ(X_i), and the covariances between correlated ones, cov(X_i, X_j), if all the input variables are independent, (X_i, X_j) = 0. The essence of the MCS is to simulate sampling to a target population with a given expectation μ_z and variance σ²(Z). A random sample of size M is obtained from the simulation of M independent and identically distributed random variables Z₁, Z₂, …, Z_M. According to the central limit theorem (Martinez, 2002), the distribution of the sum is approximately normal, with expectation M_µz and variance M_σ2:

(2.32)

According to the rule of “two sigmas,” the probability

(2.33)

By dividing Equation (2.33) by M on both sides, the expression becomes

(2.34)

By substituting images into Equation (2.34), the expression becomes

(2.35)

This relationship is the foundation of the MCS because it establishes the rule to evaluate the error of set as . The sample variance may be best estimated using σ²:

(2.36)

After defining the coverage probability, P, the confidence interval for the result is evaluated as , where extremes correspond to the 2.5 and 97.5% percentiles of the sorted Z values. When the skewness value of the Z forecast discrete distribution is near zero, the confidence interval becomes symmetric and expanded uncertainty U(Z) is estimated by Equation (2.38):

(2.37)

MCS is a powerful tool in quantifying uncertainty in addition to experimental, analytical, or other numerical methods. MCS protocol can replace point estimates with random variables drawn from probability density functions. To perform MCS, a computer program is used to generate the pseudorandom numbers to simulate the values of the variables within a given PDF. Several commercial software programs such as Crystal Ball, LabVIEW, @RISK, and Analytica are the most popular for this purpose. There are three major steps if Crystal Ball by Oracle (2016) is used to carry out the MCS. First, a probability distribution is incorporated into a spreadsheet cell, and each time the spreadsheet is recalculated, a new value of the random variable is selected from the distribution and used for calculations. Second, the entire simulation is run at least 10 000 times to satisfy the required high number of trials. Each time new values of the random variables are selected, a new estimate of the final target is generated. Third, the results of simulations are summarized in a user‐friendly interface such as a table and a figure.

A step‐by‐step procedure for uncertainty assessment of regression QSAR model for halogenated DBPs is illustrated in Figure 2.21. The first step is the compilation of the DBP data. Three variables such as E_LUMO, the number of chorine (N_Cl), and the number of carbon (N_C) were used. The characterization of the probability distributions is carried out by statistic software SAS 9.4 to obtain the data average and standard deviation. All these parameters are fed into the MCS that gives the results in a probability distribution around a mean value that is used to carry out a detailed sensitivity analysis. A sensitivity analysis is applied to identify which parameters had the most impact on the predicted log P. If small modifications of one parameter characterized by a probability distribution strongly influenced the final result, it may be concluded that the sensitivity of the variable is very high. Sensitivity is crucial in determining what variables are the most important in predicting a dependent outcome. This can be analyzed by displaying the sensitivity as a percentage of the contribution from each parameter to the variance of the final result. Crystal Ball presents contribution in terms of percentage for each independent variable with the sum of percentage contribution to 100%. A general procedure in the uncertainty quantification of the predicted log P of a class DBP through MCS is outlined as follows:

Selection of significant sources of uncertainty. Molecular descriptors E_LUMO, N_Cl, N_C, the intercept b₀, and the coefficiencies b₁, b₂, and b₃ are the significant sources contributing to the model’s uncertainty.
Identification of the probability density function corresponding to the uncertainty sources selected. All uncertainty sources, e.g., E_LUMO, N_Cl, N_C, b₀, b₁, b₂, and b₃ are analyzed by Crystal Ball to obtain its best fit probability distribution functions. The results of the probability functions for each variable are then used in the best fit MLR equation to predict log P.
Selection of the number M of Monte Carlo Simulation trials. For example, MCS was carried out 10 000 times in order to have a sufficiently high number of trials.
Simulation of M samples {X_i1, X_i2, …, X_in,} for each X_i (E_LUMO, N_Cl, N_C, b₀, b₁, b₂, and b₃) uncertainty source, which was considered a random variable with a probability density function p(E_LUMO), p(N_Cl), p(N_C), p(b₀), p(b₁), p(b₂), p(b₃).
Computation of the M results {Z₁, Z₂, …, Z_n} by applying Equation (2.27) or (2.37) to M samples for each variable X_i (E_LUMO, N_Cl, N_C, b₀, b₁, b₂, and b₃).
Analysis and interpretation of Monte Carlo Simulation results.

Flowchart depicting the procedure of Monte Carlo simulation in uncertainty assessment, starting from compilation of DBP data to selection of essential parameters leading to analysis and discussion of results. — **Figure 2.21** Procedure for Monte Carlo simulation in uncertainty assessment.

(*Source:* Adapted from the US EPA (1999).)

Example 2.9 presents the MCS results produced by Crystal Ball software. The methodology can be used for the quantification of uncertainties of any regression curve in general.

Example 2.9

Given: Chen et al. (2016) developed the following QSAR models to predict log P of halogenated alkanes (Table 2.22).

Table 2.22 List of MLR equations for halogenated alkanes.

No.	R²	MLR equation
1	0.887	log P = 2.679 – 12.068 * E_LUMO + 0.092 * N_Cl + 0.531 * N_C
2	0.893	log P = 2.641 – 12.143 * E_LUMO + 0.104 * N_Cl + 0.536 * N_C
3	0.890	log P = 2.634 – 11.995 * E_LUMO + 0.103 * N_Cl + 0.531 * N_C
5	0.894	log P = 2.696 – 12.207 * E_LUMO + 0.089 * N_Cl + 0.535 * N_C
6	0.892	log P = 2.635 – 12.084 * E_LUMO + 0.103 * N_Cl + 0.535 * N_C
7	0.890	log P = 2.626 – 11.951 * E_LUMO + 0.101 * N_Cl + 0.533 * N_C
8	0.891	log P = 2.632 – 11.967 * E_LUMO + 0.100 * N_Cl + 0.534 * N_C
9	0.885	log P = 2.630 – 11.918 * E_LUMO + 0.103 * N_Cl + 0.528 * N_C
10	0.884	log P = 2.664 – 12.535 * E_LUMO + 0.106 * N_Cl + 0.547 * N_C
11	0.891	log P = 2.637 – 12.137 * E_LUMO + 0.104 * N_Cl + 0.537 * N_C
12	0.891	log P = 2.636 – 12.130 * E_LUMO + 0.104 * N_Cl + 0.537 * N_C
13	0.893	log P = 2.638 – 12.071 * E_LUMO + 0.103 * N_Cl + 0.534 * N_C
14	0.878	log P = 2.625 – 11.833 * E_LUMO + 0.103 * N_Cl + 0.526 * N_C
15	0.850	log P = 2.614 – 11.606 * E_LUMO + 0.102 * N_Cl + 0.518 * N_C
16	0.890	log P = 2.631 – 11.979 * E_LUMO + 0.103 * N_Cl + 0.531 * N_C
17	0.890	log P = 2.632 – 11.986 * E_LUMO + 0.103 * N_Cl + 0.531 * N_C
18	0.889	log P = 2.652 – 12.345 * E_LUMO + 0.105 * N_Cl + 0.542 * N_C
19	0.895	log P = 2.363 – 10.840 * E_LUMO + 0.136 * N_Cl + 0.532 * N_C
21	0.888	log P = 2.612 – 11.780 * E_LUMO + 0.102 * N_Cl + 0.530 * N_C
22	0.891	log P = 2.616 – 11.836 * E_LUMO + 0.103 * N_Cl + 0.533 * N_C
23	0.894	log P = 2.587 – 11.713 * E_LUMO + 0.106 * N_Cl + 0.532 * N_C
24	0.894	log P = 2.592 – 11.809 * E_LUMO + 0.107 * N_Cl + 0.536 * N_C
25	0.885	log P = 2.632 – 11.887 * E_LUMO + 0.100 * N_Cl + 0.531 * N_C
26	0.901	log P = 2.640 – 12.288 * E_LUMO + 0.106 * N_Cl + 0.548 * N_C
27	0.889	log P = 2.632 – 11.930 * E_LUMO + 0.100 * N_Cl + 0.532 * N_C
28	0.892	log P = 2.618 – 11.895 * E_LUMO + 0.103 * N_Cl + 0.534 * N_C
29	0.891	log P = 2.634 – 11.963 * E_LUMO + 0.101 * N_Cl + 0.533 * N_C
30	0.891	log P = 2.718 – 12.333 * E_LUMO + 0.092 * N_Cl + 0.533 * N_C
31	0.891	log P = 2.613 – 11.885 * E_LUMO + 0.102 * N_Cl + 0.533 * N_C
32	0.894	log P = 2.626 – 11.938 * E_LUMO + 0.103 * N_Cl + 0.535 * N_C
33	0.897	log P = 2.633 – 12.077 * E_LUMO + 0.104 * N_Cl + 0.540 * N_C
34	0.891	log P = 2.628 – 11.954 * E_LUMO + 0.100 * N_Cl + 0.534 * N_C
35	0.890	log P = 2.623 – 11.928 * E_LUMO + 0.103 * N_Cl + 0.533 * N_C
36	0.903	log P = 2.569 – 11.768 * E_LUMO + 0.123 * N_Cl + 0.534 * N_C
37	0.891	log P = 2.613 – 11.877 * E_LUMO + 0.101 * N_Cl + 0.534 * N_C
38	0.891	log P = 2.628 – 11.958 * E_LUMO + 0.102 * N_Cl + 0.534 * N_C
39	0.903	log P = 2.572 – 11.788 * E_LUMO + 0.123 * N_Cl + 0.534 * N_C
40	0.892	log P = 2.624 – 11.960 * E_LUMO + 0.104 * N_Cl + 0.534 * N_C
41	0.891	log P = 2.627 – 11.926 * E_LUMO + 0.099 * N_Cl + 0.533 * N_C
42	0.895	log P = 2.703 – 12.223 * E_LUMO + 0.078 * N_Cl + 0.535 * N_C
43	0.898	log P = 2.599 – 11.640 * E_LUMO + 0.083 * N_Cl + 0.533 * N_C
45	0.917	log P = 2.888 – 12.354 * E_LUMO + 0.080 * N_Cl + 0.527 * N_C

Find: Using MCS to quantify the uncertainty and sensitivity of the following QSAR models for predicting log P of DBPs using molecular descriptors such as the energy of the lowest unoccupied molecular orbital (E_LUMO), the number of chlorine (N_Cl), and the number of carbon (N_C).

Solution:

Table 2.23 lists MLR equations developed by LOO CV. A probability density function for each of E_LUMO, N_Cl, N_C, b₀, b₁, b₂, and b₃ is analyzed one by one using Crystal Ball software. The above probability distribution for each variable is defined to forecast log P in the Crystal Ball. A trial value of M = 10 000 is selected, and the results from Crystal Ball are shown in Figures 2.22 and 2.23. Figure 2.22 shows that log P has a beta bell curve with a mean value of 2.61 and a standard deviation of 1.24. The forecasted distribution of the predicted log P has the skewness of 0.0422. The model’s uncertainty could also be calculated by Equation (2.37). The best probability distribution model for predicting or forecasting log P ranked by Anderson–Darling method is beta, while the normal distribution is ranked as the second best fit.

Table 2.23 List of variables in QSAR model of halogenated alkanes.

Variable	Distribution type	Defining characteristics
b₀	Student’s T	Middle point 2.635, scale 0.020, degree of freedom 1.001
b₁	Logistic	Mean −11.970, scale 0.131
b₂	Logistic	Mean 0.102, scale 0.007
b₃	Logistic	Mean 0.533, scale 0.002
E_LUMO	Beta	Minimum 0.05, maximum 0.22, alpha 1.682, beta 0.6348
N_Cl	Discrete uniform	Minimum 1, maximum 5
N_C	Discrete uniform	Minimum 1, maximum 10

Figure 2.22 Log P output of halogenated alkane by Monte Carlo simulation.

(Source: Adapted from Chen (1999).)

Figure 2.23 Contribution to variance view of halogenated alkane by Monte Carlo simulation.

(Source: Adapted from Chen (1999).)

The sensitivity chart (Figure 2.23) shows that N_C contribute to positive 91.7%, N_Cl has a negative 8.2% influence on the log P predicted, and E_LUMO has almost no influence, which suggests that log P and E_LUMO are two independent variables.

The above matrix is a correlation map that allows a viewer to visually identify patterns in correlations. The closer the points conform to the regression line, the stronger the correlations exist among the variables. Figure 2.24 shows that log P has the highest correlation coefficient of 0.9006 with N_C, while it has the lowest correlation coefficient of 0.1003 with N_Cl. The very low correlation coefficients among E_LUMO, N_Cl, and N_C suggest that the three molecular descriptors are independent.

Figure 2.24 Matrix view of halogenated alkane by Monte Carlo simulation.

(Source: Adapted from Chen (1999).)

2.5.3 Comparison of Uncertainties of Different QSAR Models

The uncertainty of log P could also be estimated by point estimate method (PEM). In most cases, uncertainty quantified by MCS should be more accurate than that calculated by PEM because MCS counts the distributions and propagation of the uncertainty. The uncertainty results obtained from the PEM method and the MCS for QSAR models are compared in Example 2.10.

Example 2.10

Given: Chen (1999) quantified uncertainties in terms of standard deviation of DBP QSAR models obtained though point estimate method (PEM) and MCS method, respectively. The results are listed in Table 2.24.

Table 2.24 Uncertainties obtained by point estimate and MCS.

No.	DBP class	Point estimate method	Monte Carlo method
1	Halogenated alkane	1.81	2.48
2	Halogenated alkene	2.46	1.82
3	Halogenated aromatic	1.74	2.06
4	Halogenated aldehyde	2.15	2.98
5	Halogenated ketone	2.32	2.68
6	Halogenated carboxylic acid	2.14	2.68

Find:

Which uncertainty estimated by point estimate (PE) versus MCS is more reliable?
What are the maximal and minimal errors when the uncertainties are compared?

Solution:

The uncertainty results from two different methods are different as shown in Figure 2.25. The uncertainties computed by the MCS for all the DBP classes, except halogenated alkene, are relatively higher than those computed by the PEM. Uncertainty results from MCS are more reliable than the results from PEM because MCS simulates the full spectra of the variables, while PEM only takes consideration of the first order of the model errors.

Figure 2.25 Uncertainty obtained by point estimate and Monte Carlo methods.

(Source: Adapted from Chen (1999).)
- The maximal error is the uncertainty of the QSAR model for log P of halogenated aldehyde. The PEM underpredicts the log P by −27.85%.
- The minimal error is the uncertainty of the QSAR model for log P of halogenated ketone. The PEM underpredicts the log P by −13.43%.

Comments: In general, MCS would be more accurate than PEM if the probability density function models assumed are correct.

2.5.4 Sensitivity Analysis by Monte Carlo Simulation

The influence of different variables on the outcome of model’s prediction can be visually presented through sensitivity analysis in Crystal Ball. Table 2.25 summarizes the sensitivity results for each DBP class. Figure 2.26 demonstrates that N_Cl is the most influential molecular descriptor in predicting log P of all DBP classes, except halogenated alkane.

Table 2.25 Sensitivity of various descriptors to log P by DPB classes.

No.	Compound class	E_LUMO (%)	N_Cl (%)	N_C (%)
1	Halogenated alkane	−16.0	0.8	83.2
2	Halogenated alkene	−44.0	53.9	2.1
3	Halogenated aromatic	0.6	90.6	8.7
4	Halogenated aldehyde	−7.3	64.9	27.8
5	Halogenated ketone	0.0	93.0	7.0
6	Halogenated carboxylic acid	36.5	55.6	7.9

Sensitivity of various descriptors to log P by DPB classes, depicted by 6 sets of 3 shaded vertical bars for group number 1, 2, 3, 4, 5, and 6. Each set consists of bars representing E(LUMO), NCl, and NC. — **Figure 2.26** Sensitivity of various descriptors to log P by DPB classes.

(*Source:* Adapted from Chen (1999).)

Table 2.25 shows the major difference between sensitivities of single bond compounds, such as halogenated alkanes. Log P of chlorinated alkane changes significantly with the length of the carbon chain, N_C. However, for all the other classes of DBPs containing unsaturated π bond, log P will change mostly with the number of chlorine, N_Cl. Log P reflects the amount of hydrogen bonding between the chemical compound and the hydrogen in the water molecule in the absence of oxygen in the chlorinated alkanes. It appears that when chlorine attaches to carbon, it only has influence on the electron cloud of the attached carbon and such electronic interaction is tightly bonded; therefore, the number of chlorine on the chlorinated alkane carbon chain does not significantly influence the strength of hydrogen bonds. However, with DBP compounds containing unsaturated π bonds, the chlorine atom would attract electron cloud to itself since most compounds containing unsaturated bonds may have a resonance structure through which the influence of chlorine will be transmitted. As a result, the strength of the hydrogen bond between the chemical compound and water molecule would be significantly influenced by the number of chlorine, N_Cl. Therefore, log P is more sensitive to the number of chlorine.

2.5.5 Computer Software for Quantitative Risk Assessment

There are many computer software available to conduct risk assessment. Table 2.26 lists other computer software for quantitative risk assessment.

Table 2.26 Software for quantitative risk assessment.

Software	Type of analysis	Creator
@RISK	Uncertainty and risk analysis	Palisade Corporation
Analytica	Uncertainty and risk analysis	Lumina Decision Systems, Inc.
GENII/SUNS	Uncertainty and sensitivity analysis	Sandia National Laboratories
GENII/SUNS	Uncertainty and sensitivity analysis	Pacific Northwest Laboratory
MOUSE	Uncertainty analysis	EPA, Risk Reduction Engineering Laboratory
ORMONTE	Uncertainty and sensitivity analysis	Oak Ridge National Laboratory
Risk Calc	Uncertainty and risk analysis	Applied Biomathematics
SimLab	Uncertainty and sensitivity analysis	Simlab
TAM3	Uncertainty and sensitivity analysis	Oak Ridge National Laboratory
Uncertainty Analysis	Uncertainty analysis	Integrated Sciences Group
Crystal Ball^®	Uncertainty and sensitivity analysis	Oracle

2.6 Exercise

2.6.1 Questions

What is the meaning of typical MD such as log P, E_LUMO, and E_HOMO?
What is quantitative structure–activity relationship (QSAR)?
How can QSAR be used in HRA?
What are the typical molecular descriptors?
What EPA QSAR tools can be used for HRA?
What are the four steps in risk assessment?
What are the four different ways to identify the hazards of a chemical compound?
How do you calculate RfD?
How do you calculate BMD and BMDL?
What are the differences between assessment procedures of carcinogenic and noncarcinogenic chemical compounds?
How is the carcinogenic risk of a chemical compound calculated?

2.6.2 Calculation

Wang et al. (2014) reported that halobenzoquinones (HBQs) as a class of DBPs are of likely to be carcinogenic and found that the IC₅₀ of HBQs are always less than their counter‐hydroxylated products such as halo‐hydroxyl‐benzoquinones (OH‐HBQs). For toxicity to the Chinese hamster ovary CCL‐61 (ATCC, Manassas, VA) cell (CHO‐K1), the IC₅₀ in μM for 24‐hour incubation of DCBQ, DCMBQ, TriCBQ, and DBBQ are 27.3, 11.4, 45.5, and 19.8, respectively. The IC₅₀ in μM of their corresponding hydroxylated compounds, OH‐HBQs, are 61.0, 20.4, 64.4, and 42.8, respectively. Answer the following questions:
1. Which class of the HBQs or OH‐HBQs is more toxic? Why?
2. Calculate the corresponding BMDL values using the EPA BMDS 2.6.0.1 software. Is there any correlation between the BMDL and the IC50?
3. Explain why there is or there is no correlation existing between these data for both HBQs and OH‐HBQs.
Chloroform is one of the disinfection by‐products of chlorinated water. Calculate the BMDL and BMD of chloroform with 95% confidence interval by using the EPA BMDS 2.6.0.1 software, which is available at https://www.epa.gov/bmds/download‐benchmark‐dose‐software‐bmds#installing
The current FDA guideline for methyl mercury in fish is 1 ppm (or 1 mg of methyl mercury/kilogram of fish). Answer the following:
1. Calculate the dose of mercury (mg/kg body weight of the person) if a person were to eat a tuna fish sandwich (with 3 oz of tuna fish meat) with methyl mercury levels of 0.4 ppm.
2. How does the amount of methyl mercury in a 4‐oz sandwich compare with the maximum amount that a pregnant woman should eat on a daily basis, according to the EPA official guidelines? List your citation.
3. How many 4‐oz tuna fish sandwiches would a person need to eat in order to have a 50% probability of dying from acute methyl mercury poisoning? Assume that humans and rats (or mice) have equal sensitivity to methyl mercury, that oral exposure in rats (or mice) is comparable with oral exposure in humans, and that all mercury in the fish is in the form of methyl mercury.
Study EPA example of human health risk assessment of isodecyl acrylate. Answer the following:
1. What uncertainty could be introduced in the HRA?
2. How would you outline an MCS procedure to quantify the uncertainty?

2.6.3 Assignment

Study the Crystal Ball user manual.
Study the SPSS user manual.
Study EPA guidelines on Monte Carlo simulation.
Go to the website http://apps.who.int/gho/data/node.main.1?lang=en and download the following Excel database on World Health Statistics:
- Mortality and global health estimates
- Cause‐specific mortality and morbidity
- Selected infectious diseases
- Health service coverage
- Risk factors
- Health systems
- Health equity monitor
- Demographic and socioeconomic statistics

Under the heading of “Mortality and global health estimates,” you will be able to download the following database:

Life expectancy for different countries. Use SPSS to conduct the following:

Statistical analysis of mean and standard deviation of life expectancy for developed and developing countries. What does this mean and standard deviation suggest for the two groups of countries when their mean life expectancy is compared?
Income and environmental indicators are the two major indicators. Conduct correlation analysis between life expectancy and income and air quality indicator, or both. Answer the following:
1. Is there any correlation between life expectancy and income and air quality indicator, or both?
2. Using Crystal Ball to analyze sensitivity and state, which one is more important predictor for life expectancy?

2.6.4 Projects

2.6.4.1 Xiongan Project

Collect water quality and air quality data of Xiongan and conduct health risk assessment on the major primary pollutants of the following. Toxicity data could be found in the US EPA Exposure Factors Handbook 2011 Edition:

Air pollutants: O₃, SO₂, NO₂, P.M._2.5, and P.M.₁₀
Water pollutants: trihalomethanes (TTHM) and five haloacetic acids (HAA5)
Soil pollutants: (a) toxic metals (Hg, Cr, and As), (b) pesticides (parathion, dichlorvos, and aldrin), and (c) herbicides (glyphosate, atrazine, and flazasulfuron)

Answer the following questions:

What are the log P and E_LUMO and E_HOMO for water pollutants?
How do these molecular descriptors contribute to their toxicity?
What are the health risks of these pollutants?
Estimate lung cancer and liver cancer number of Xiongan due to air and water pollutants in the next 5, 10, 20, and 50 years.

2.6.4.2 Community Project

Do the same as above as your hometown and estimate lung cancer and liver cancer number of your city due to air, water, and soil pollutants in the next 5, 10, 20, and 50 years.
Rank the health risk of air, water, and soil pollutants and identify major contributor of major pollutants.
Make recommendations on priority of major EEIS to be built to the local environmental protection agencies to reduce these risks less than 10⁻⁶ in the next 5, 10, 20, and 50 years.

References

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. Proceedings of the Second International Symposium on Information Theory (ed. B.N. Petrov and F. Csaki), 267–281. Budapest: Akademiai Kiado.
Chen, R.Z. (1999). Development, Validation, and Uncertainty Analysis of Quantitative Structure and Activity Relationship Models for log P of Disinfection By‐products. PhD.dissertation, Florida International University, Miami, FL.
Chen, R.Z., Tang, W.Z., and Sillanpää, M. (2016). Prediction of log P of halogenated alkanes by their E_LUMO and number of chlorine and carbon. Environmental Processes 1: 73–91.
Chinese Ministry of Environmental Protection (MEP). (2016). Ambient Air Quality Standards. http://english.mep.gov.cn/Resources/standards/Air_Environment/quality_standard1/201605/t20160511_337502.shtml (accessed 29 November 2016).
Coleman, H.W. and Steele, Jr., W.G. (1989). Planning an experiment: General uncertainty analysis. Experimentation and Uncertainty Analysis for Engineers. New York: Wiley Blackwell.
Cothern, C.R., Coniglio, W.A., and Marcus, W.L. (1986). Estimating risk to human health. Environmental Science and Technology 20 (2): 111–116.
Cox, M.G., Dainton, M.P., and Harris, P.M. (2001). Software support for metrology best practice guide. Uncertainty and statistical modeling. Technical Report. Teddington: National Physical Laboratory.
Crump, K. (1984). A new method for determining allowable daily intakes. Fundamental and Applied Toxicology 4: 854–871.
Efron, B. (1983). Estimating the error rate of a prediction rule: improvement on cross – validation. Journal of American Statistical Association 78 (382): 316–331
EPA (U.S. Environmental Protection Agency) (1995). Guidelines for Neurotoxicity Risk Assessment. EPA/630/R‐95/001F. Risk Assessment Forum, U.S. (EPS), Washington, DC [online]. www.epa.gov/ncea/raf/pdfs/neurotox.pdf (accessed 30 March 2005).
Eriksson, L., Jaworska, J., Worth, A.P., et al. (2003). Methods for reliability and uncertainty assessment and for applicability evaluations of classification and regression‐based QSARs, Environmental Health Perspectives 111: 1361–1375.
GUM (1993). Guide to the Expression of Uncertainty in Measurements. Geneva: ISO.
Hansch, C., Leo, A., and Hoekman, D. (1995). Exploring QSAR—Fundamentals and Applications in Chemistry and Biology. Washington, DC: American Chemical Society.
IPCC (2013). Summary for Policymakers. In: Climate Change 2013: The Physical Science Basis. Contribution of Working Group I to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change (ed. T.F. Stocker, D. Qin, G.‐K. Plattner, et al.). Cambridge/New York: Cambridge University Press.
IPCC (2015). Fifth assessment. AR5.
Jiang Z., Yang, H., Guan, Y. et al. (2000). Chlorine effect on molecular descriptors of disinfection by‐products. Environmental Science 21(5): 51–54.
Kimmel, C. and Gaylor, D. (1988). Issues in qualitative and quantitative risk analysis for developmental toxicology. Risk Analysis 8: 15–21.
Leo, A. and Hansch, C. (1999). Role of hydrophobic effects in mechanistic QSAR. Perspectives in Drug Discovery 17: 1–25.
Linhart, H. and Zucchini, W. (1986). Model Selection. New York: Wiley.
Martinez, W.L. (2002). Computational Statistics Handbook with MATLAB. Boca Ratón: Chapman & Hall/CRCnetBase.
Massart, D.L., Vandeginste, B.G.M., Buydens, L.M.C., et al. (1997). Handbook of Chemometrics and Qualimetrics: Part A. Amsterdam: Elsevier Science.
National Bureau of Statistics of China (2012). China Statistical Yearbook. Beijing: China Statistics Press. http://www.stats.gov.cn/tjsj/ndsj/2012/indexeh.htm (accessed January 2018).
NTP (National Toxicology Program) (1988). Toxicology and carcinogenesis studies of chlorodibromomethane. CAS No. 124‐48‐1 in F344/N rats and B6C3F1 mice (gavage studies). TR‐282. https://ntp.niehs.nih.gov/results/pubs/longterm/reports/longterm/index.html (accessed 21 March 2018).
OECD (2007). Guidance document on the validation of (Quantitative) Structure–Activity Relationships [(Q)SAR] models. Paris: Organization for Economic Co‐Operation and Development.
Osten, D. (1988). Selection of optimal regression models via cross‐validation. Journal of Chemometrics 2: 39–48
Pierotti, A.J. (1999). Chlorine effect on molecular descriptors of disinfection by‐products. Master of Science Degree thesis. Florida: Florida International University.
Pontius, F.W. (1990). Toxicology and drinking water regulations. Journal of American Water Works Association 90: 17–19.
Stone, M. (1998). Akaike’s criteria. In: Encyclopedia of Biostatistics (eds. P. Armitage and T. Colton). New York: Wiley.
Tang, W.Z. and Wang, F. (2010). Chlorine effect on quantum molecular descriptors of disinfection by‐products‐chlorinated alkanes. Chemosphere 78: 914–921.
Tang, W.Z., Wang, F., Miralles‐Wilhelm, F., and Damisse, E. (2009). Uncertainty analysis of rating equations of submerged orifice flow at gated spillway. Conference on Reliability and Quality in Design, the International Society of Science and Applied Technologies (ISSAT) and the IEEE Reliability Society, San Francisco, CA (6–8 August 2009), 165–169.
The US EPA (1997). Guiding principles for Monte Carlo analysis. EPA/630/R‐97/001.
The US EPA (2016). NAAQS table. https://www.epa.gov/criteria‐air‐pollutants/naaqs‐table (accessed 29 November 2016).
U.S. Environmental Protection Agency (EPA) (1994). Drinking water maximum contaminant level goals and national primary drinking water regulations for lead and copper. Federal Register 59(125): 33860–33864.
US EPA (1992). Guidelines for environment exposure assessment. Federal Register 57: 22888–22936.
U.S. EPA (1996). Safe drinking water act standards and health advisories. Office of Water 4305. EPA‐822‐B‐96‐002.
US EPA (1997). National primary drinking water regulations: disinfectants and disinfection byproducts notice of data availability; proposed rule. EPA‐815‐Z‐98‐005.
US EPA (2005). Guidelines for carcinogen risk assessment. Federal Register 70, 66, 177650‐18717. http://www.epa.gov/raf/pubalpha.htm (accessed 15 December 2017).
US EPA (2006). Ultraviolet disinfection guidance manual for the final long term 2 enhanced surface water treatment rule. EPA 815‐R‐06‐007, November 2006.
US EPA (2011). Exposure factors handbook 2011 edition (Final). EPA/600/R‐09/052F. Washington, DC: US Environmental Protection Agency.
US EPA (2012). Benchmark dose technical guidance, risk assessment forum. 20460, EPA/100/R‐12/001. Washington, DC: US Environmental Protection Agency, June 2012.
US EPA (2015). User’s Manual for Benchmark Dose Software 2.6. Washington, DC: US Environmental Protection Agency.
US EPA (2016). Benchmark dose tools. https://www.epa.gov/bmds (accessed 20 November 2016).
Wang, W., Qian, Y.C., Li, J.H., et al. (2014). Analytical and toxicity characterization of halo‐hydroxylbenzoquinonesas stable halobenzoquinone disinfection byproducts in treated water. Analytical Chemistry 86: 4982−4988.
World Health Organization (WHO) (2014). The world cancer report. Geneva: WHO.
World Health Organization (WHO) (2016). World health statistics 2016: monitoring health for the SDGs, sustainable development goals. Geneva: WHO.
Yang, H., Jiang, Z., Guan, Y., et al (2000). Correlations between SDWA standards and health advisories concentrations of DBPs and molecular descriptors. Water and Wastewater Engineering 26 (1): 22–25.
Zhao, P., Dai, M., Chen, W., and Li, N. (2010). Cancer trends in China. Japanese Journal of Clinical Oncology 40 (4): 281–85.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 2 Health Risk Assessment

Create new playlist

Sign In

Sign Up

2.1 Environmental Health

2.2 Environmental Standards

2.3 Health Risk Assessment

2.3.1 Hazard Identification

2.3.2 Dose–Response Curves

2.3.2.1 Nonlinear Dose–Response Assessment

2.3.2.2 Linear Dose–Response Assessment

2.3.3 Exposure Assessment

2.3.3.1 Cancer Screening Calculation for Dermal Contaminants in Water

2.3.3.2 Noncancer Screening Calculation for Contaminants in Residential Soil

2.3.4 DBP Health Advisory Concentration

2.3.5 Risk Characterizations

2.4 QSAR Analysis in HRA

2.4.1 Multiple Linear Regression (MLR)

2.4.2 Validation of QSAR Models

2.5 Quantification of Uncertainty

2.5.1 Quantification of QSAR Model’s Uncertainty

2.5.2 Monte Carlo Simulation

2.5.3 Comparison of Uncertainties of Different QSAR Models

2.5.4 Sensitivity Analysis by Monte Carlo Simulation

2.5.5 Computer Software for Quantitative Risk Assessment

2.6 Exercise

2.6.1 Questions

2.6.2 Calculation

2.6.3 Assignment

2.6.4 Projects

2.6.4.1 Xiongan Project

2.6.4.2 Community Project

References

Table of Contents for
2 Health Risk Assessment