access indexes 12
accuracy 2, 9, 11, 13, 17, 19, 24–25, 33–34, 43, 47, 50, 55–57, 60–63, 67, 71, 73, 75, 76, 78–80, 89
adjacency matrix 88
analysis window 37
analytical resolution 10, 25, 67
automatic text categorization 13, 62, 68, 71, 74, 77–78, 80–85
bibliometrics 87
Boolean 13
categorization 1–3, 13, 25, 44–45, 48, 56, 60, 62, 66, 68, 71–85
Central Intelligence Agency 21
citation indexes 12
closed captioning 19
clustering 2, 11, 43, 55–56, 60, 71–85, 92, 96
Coleman-Liau Index 27
collections 7, 10–11, 13–14, 18, 21, 30–31, 34, 41, 44, 59, 69, 71–72, 74–75, 80, 82–83, 87, 90, 97
confusion matrix 76
consistency 2, 45, 50, 56, 61, 77
co-occurrence 1–2, 6, 36–42, 68, 88–89, 96
correlation 4, 29, 31–32, 36–42, 45, 47, 56, 63, 69, 87–88
CPAN 6
DICTION 3, 27–29, 32, 35, 44, 62, 66–68, 97–99
Dictionary of Affect in Language 66–67, 98, 99
directionality 38
distance matrix 88
document extraction 23
emotion 11, 19, 20, 28, 44, 65–70
entity extraction 38, 43–56, 87, 96
evolution 11, 30–32, 65, 91, 96
extraction 1–2, 6, 23–25, 29, 38, 43–64, 72, 84, 87, 89, 96
feature reduction 73–74, 78, 85
filtering 24
Flesch-Kincaid Index 27
Foreign Broadcast Information Service 21
Freedom of Information Act 13
Fulltext Sources Online 13
Gale Directory of Databases 98
gender 26, 30, 33–35, 41, 45, 72–73, 98
General Architecture for Text Engineering 5, 50
Geographic Names Information System 52
GEOnet Names Server 53
Google 8, 12, 21–22, 28, 51, 56, 62, 68, 80
Gunning-Fog Index 27
hapax legomena 31
keyword 7, 8, 11, 13, 17, 22, 24–25, 29, 31, 39, 42–43, 47, 50, 61–63, 67, 71, 78, 80
learning algorithm 74–75, 78–79, 84–85
lexeme 33
lexicon 1, 5, 6, 39–56, 65–68, 70, 85, 99
Lexis Academic 8
LexisNexis 8, 12, 15–16, 21, 49
Library of Congress 9
licensing restrictions 7, 14, 16, 25
metadata 20, 30, 39–42, 45, 80, 87, 96
Minnesota Contextual Content Analysis 44
National Center for Supercomputing Applications 6, 50
network 1–3, 22, 23, 40, 68, 74–75, 86–98
New York Times 14, 16–17, 20, 24, 53, 98
newspapers 8, 11, 13–16, 20–21, 24, 65
n-gram 73
Orange Book 48
overlap 45, 46, 55, 60, 81, 88
partitional clustering 82
pendant 86
Porter stemmer 34
Practical Extraction and Reporting Language 6
progressiveness 26
Proquest Historical Newspapers 8, 11
randomization 18
rejection rate 76
reliability 2, 67, 68, 77, 79, 99
reproducibility 2
reshaping 25
scale 1–2, 5, 8, 17, 24, 44, 49, 50, 57, 66–69, 77, 89, 96
search 6–9, 11–14, 16–18, 20–25, 31, 33, 40, 42, 46–51, 53, 55, 58, 71, 80, 85, 97
semantic 11, 36, 37, 38, 40, 44, 47, 57, 58–63, 67–69, 73, 79, 89, 97, 98
sentiment analysis 1, 3, 5–6, 29, 43, 65–70
similarity 11, 33, 45, 47, 60, 71–75
site mirroring 22
slang 29
source stability 8
Sourcebook to Public Record Information 13
specialized tools 40608
structured 58
Summary of World Broadcasts 15, 21
surface statistics 8
terms of use 22
topic extraction 1–2, 6, 29, 38, 57–64
true negative 76
understanding a source 15
unintended uses 9
University of Illinois 6, 97–98
validate 44
Vanderbilt Television News Archive 12
viewership 17
vocabulary 1–6, 11, 22, 24, 26–35, 43, 45, 47, 56–57, 62–65, 67, 69–70, 72, 87, 91
word birth 31
word death 31
World Bank 13
XML 58
Yahoo 8