Chapter 3: SAS Predefined Concepts: Enamex

3.1. Introduction to SAS Predefined Concepts

3.2. Person

3.2.1. Titles in Person Names

3.2.2. Suffixes as Part of a Personal Name

3.2.3. Single-Word Names

3.2.4. Body References

3.2.5. Quotes

3.2.6. Locations as Part of Name

3.2.7. Groups of Individuals

3.2.8. Historical Figures, Saints, and Deities

3.2.9. Animals, Fictional Characters, Artificial Intelligence, and Aliens

3.2.10. Businesses Named after People

3.2.11. Laws, Diseases, Prizes, and Works of Art

3.3. Place

3.3.1. Common Nouns and Determiners

3.3.2. Subnational Regions and Other Descriptors

3.3.3. Street Addresses

3.3.4. Monuments

3.3.5. Celestial Bodies

3.3.6. Neighborhoods

3.3.7. Fictional Place Names

3.3.8. Conjoined Location Names

3.3.9. Special Cases for Nonmatches

3.4. Organization

3.4.1. Corporate Designators or Suffixes

3.4.2. Determiners before Proper Names

3.4.3. Facility Names Associated with an Organization

3.4.4. Groups of Individuals

3.4.5. Aliases

3.4.6. Conjoined Organization Names

3.4.7. Event Names

3.4.8. Special Cases for Nonmatches

3.5. Disambiguation of Matches

3.5.1. Organization or Place

3.5.2. Organization or Product

3.5.3. Organization or Person

3.1. Introduction to SAS Predefined Concepts

As you will recall from the previous chapter, a named entity is one or more words or numeric expressions in sequence which name a single individual or specify an instance of a type in the real world (or an imaginary world).

SAS provides a set of seven predefined entities called predefined concepts, spanning the three types of entities described in chapter 2:

  • Enamex (Person, Place [Location], Organization), detailed in this chapter
  • Timex (Date, Time), detailed in chapter 4
  • Numex (Money, Percent), detailed in chapter 4

All fully supported languages also provide a predefined grammatical pattern to aid in the recognition of multiwords and complex concepts:

  • Noun group, detailed in chapter 4

Although the rules that are used for the predefined concepts are proprietary and not displayed in the products, you can learn more about the principles and assumptions that form the basis for the rules for each of the predefined concepts in the sections that follow. Knowing what matches are expected for predefined concepts can help you both more accurately predict and modify behavior of the concepts, and more easily identify areas where custom concepts would be most useful for your particular extraction task.

In addition, this information can help you measure the effectiveness of an information extraction system by acting as a standards manual for setting up and annotating a gold standard corpus, as well as for data collection, with all targeted named entities marked in a consistent manner. Measuring the value of information extraction without first defining the targeted entities is like using a yardstick with no numbers or lines. The information in this chapter and in chapter 4 defines the numbers and lines on that yardstick.

Referencing these standards can also be a useful step in troubleshooting matches. It can help you align expectations regarding the existence and disambiguation of matches and their scope in various contexts.

This chapter and chapter 4 are a reference that you can keep coming back to as you work with named entities, whether you are using SAS Text Analytics or some other approach. Because these chapters serve as a set of annotation guidelines for typical named entities, you can use them whether you are using SAS Text Analytics, or implementing your own set of entity rules using other approaches or software. The content is based on extensive research, historical definitions, and best practice guidelines that the SAS linguists have prepared during the development of cross-linguistic standards for predefined concept extraction for more than 30 languages.

In this chapter and chapter 4, matches that meet the definition of each predefined concept type are denoted in square brackets. For example, in the phrase: “the company [SAS],” only “SAS” is an extracted match (for Organization).

3.2. Person

Person is a predefined concept provided by the SAS linguists. Note that the name of this concept in your product may be nlpPerson or another similar name. The generic “Person” label is used in this book, because it aligns with industry standard practices and is similar to any concept name used in the SAS Text Analytics products in the past.

Person includes any proper name used to designate a specific individual in the real or in an imaginary world. Individual includes any intelligent agent: any real or fictional human, alien, deity, artificial intelligence, or animal.

The matches for Person include two or more of the following:

  • First name
  • Last name
  • Middle name
  • Maiden name
  • Nickname
  • Initials
  • Infixes (such as “van”, “von”, “van der”, and “de”)
  • Suffixes (such as “Jr.” or “Sr.”)
  • Other names specific to particular cultures (for example, Russian patronymic, such as “Alekseevna”)
  • Title of address

See section 3.2.3 for a discussion of when single-word names are considered Person matches. References with only an initial or initials and no other name must also have a title captured as part of the match—for example, “Mr. T.” References to people that are not proper names, as well as common nouns or pronouns are not matches for the Person concept. The match is always the longest possible combination of allowed elements.

Words that are leveraged to identify a potential match for Person include job titles and verbal constructions indicating agents of human-like actions, such as, for example, “exclaim.” These markers are not retained in the matched string; they are leveraged only as contextual cues.

Remember: Person includes any proper name designating a specific individual in the real or imaginary world.

Special cases that govern whether certain words are included in the match are described in the following sections.

3.2.1. Titles in Person Names

The matches for Person include the following titles of address:

  • Common titles
  • Familial titles
  • Professional titles
  • Religious titles
  • Military titles
  • Royalty titles

In the contexts where a person can be addressed in spoken communication with the title and first and last name, only first name, or only last name, that title is included as part of the tagged match for Person. However, job titles or descriptions are not matches for Person.

Consider the following examples of strings referencing persons:

  • Mr. President
  • Ms. Jones
  • Professor L. Noh
  • The Pope
  • Queen Elizabeth
  • Secretary of Health and Human Services
  • Secretary Sylvia Mathews Burwell
  • Sylvia Mathews Burwell
  • Father M.
  • Aunt B
  • Sergeant York
  • Miss Know-It-All
  • Princess Mary of Kent
  • King Henry VIII
  • The Duke of York
  • The President of the United States
  • President Lincoln
  • CEO of SAS
  • The Olympian
  • T. said that I should go

Pause and think: Can you identify the Person matches in the above examples?

Matches include only the following:

  • [Ms. Jones]
  • [Professor L. Noh]
  • [Queen Elizabeth]
  • [Secretary Sylvia Mathews Burwell]
  • [Sylvia Mathews Burwell]
  • [Father M.]
  • [Aunt B]
  • [Sergeant York]
  • [Princess Mary of Kent]
  • [King Henry VIII]
  • [President Lincoln]

Titles like “Secretary of Health and Human Services,” “CEO of SAS,” “Pope,” and “Duke of York” are job or professional titles that can refer to more than one individual throughout history. Such relative references, including phrases such as “Miss Know-It-All,” are not specific enough to be considered a match for the Person concept. In addition, only an initial is not enough context for a match to the Person concept.

3.2.2. Suffixes as Part of a Personal Name

Suffixes on names that are part of the specific designation of an individual and not simply related to education or career are included in the match, together with the name or names. Consider the following examples:

  • Mary Johns Ph.D.
  • John James Jr.
  • Frank Sr.
  • P. Smith M.D.
  • Rob Moore PMP
  • S. Matthews III

Pause and think: Can you identify the matches in the examples above?

Matches include the following:

  • [John James Jr.]
  • [Frank Sr.]
  • [S. Matthews III]

Only the first, last names and initials are matched in the following:

  • [Mary Johns] Ph.D.
  • [P. Smith] M.D.
  • [Rob Moore] PMP

The suffixes that follow the last names in these examples are referring to professional designations in the medical and business fields. Therefore, they are not included in the match.

3.2.3. Single-Word Names

Single-word names are included only when the context (person suffixes, job names, birthdays, or other person-related information) indicates a probable match. Consider the following examples:

  • Kent exceeded . . .
  • Jones, CEO of MyCorp, said . . .
  • Lementa, born 1962 . . .
  • Gary was nice

Pause and think: Can you identify the matches in the examples above?

  • Matches include only the following:
  • [Jones], CEO of MyCorp, said . . .
  • [Lementa], born 1962 . . .

In the remaining examples, the proper nouns are ambiguous because there is not enough context to infer that the reference is to a person. For example, Kent could be a person, company, product, or place name. Similarly, Gary is a common English name for persons but could also refer to a town in Indiana.

3.2.4. Body References

References to a body part, remains, or corpse of a person are not considered a part of the match. Consider the following examples:

  • John’s body
  • The body
  • Arms and legs
  • The remains of Mr. Smith
  • Mimi’s singing voice

Pause and think: Can you identify the matches in the examples above?

Matches include only the following:

  • [John]’s body
  • The remains of [Mr. Smith]
  • [Mimi]’s singing voice

Note that the remaining words in the matches above provide reasonably unambiguous context that the proper nouns are referring to persons.

3.2.5. Quotes

Quotes around a descriptive nickname are included within the name match if they appear within or overlap the boundaries of a person’s name. Consider the following examples:

  • James “the Bully” Holtz
  • “James the Bully” Holtz
  • James Holtz “The Bully”

Pause and think: Can you identify the matches in the examples above?

Matches include the following:

  • [James “the Bully” Holtz]
  • [“James the Bully” Holtz]
  • [James Holtz] “The Bully”

The nickname is not included in the match for the final example because it does not appear within the boundaries of the person’s name.

3.2.6. Locations as Part of Name

Locations that are part of the name are included in the match and not matched separately as Place. But mentions of titles and locations only are not included as matches. In addition, locations named for people are not included as matches to the Person concept.

Consider the following examples:

  • Duchess of York
  • Grand Duke of Saxe-Weimar-Eisenach William Ernest
  • The city of Bismarck
  • Princess Anna of Sedgewick
  • Fort William

Pause and think: Can you identify the Person matches in the examples above?

Matches include only the following:

  • [Grand Duke of Saxe-Weimar-Eisenach William Ernest]
  • [Princess Anna of Sedgewick]

The first example is not considered a person match because it is a title that could refer to different people throughout history. In the second example, the title contains a location name, so only the first and last names are parts of the match. In the third and fifth examples, the reference is to a location, even though the place name contains a person name. Therefore, they are not considered matches for Person. In the fourth example, the location is included in the Person match because it helps specify which Princess Anna is being referred to.

3.2.7. Groups of Individuals

Groups of individuals such as national, geographic, religious, or ethnic groups; family or dynasty names; or blended names of two individuals are not a match for Person.

Nonmatches include the following:

  • The Kennedy family
  • The Joneses
  • The Daniel twins
  • The House of Hanover
  • The Han dynasty
  • American
  • Frenchwoman
  • Bennifer (a blended name of Ben Affleck and Jennifer Garner)

Some groups of individuals match as Organization:

  • [Democrats]
  • [Girl Scouts]
  • [Marines]

See more about organizations in 3.4.

Terms referring to groups of two or more people are not included as matches to the Person concept. However, conjoined or listed names with elision are included as matches. The listed names are considered one single reference if part of the name is elided. The listed names are considered two or more matches if the names on either side of the conjunction are complete.

Matches include the following:

  • [Mary and John Smith]
  • [John Smith] and [Mark Frank]
  • [John, Mary, Jane and Marsha Smith]
  • [John Smith], [Mary Smith], and [T. Yokel]

Consider the following examples:

  • Latinos
  • Muslims
  • Republicans
  • The Habsburgs
  • Brangelina
  • Tolbert triplets
  • Nicole, Erica, and Jaclyn Dahm
  • Barack and Michelle Obama
  • Plácido Domingo, José Carreras, and Luciano Pavarotti

Pause and think: Can you identify the matches for the Person concept in the examples on the previous page?

Matches for the Person concept include only the following:

  • [Nicole, Erica, and Jaclyn Dahm]
  • [Barack and Michelle Obama]
  • [Plácido Domingo], [José Carreras], and [Luciano Pavarotti]

The first few examples are referring to ethnic, religious, and political groups of people, as well as family names, conjoined names, and elided names. None of these examples match the Person concept.

3.2.8. Historical Figures, Saints, and Deities

Names of saints and other historical figures are included, unless the context indicates that they appear as a part of the name of another predefined concept type. Proper names for deities are a match, but not references to deities generally, descriptive references, or exclamations.

Consider the following examples:

  • George Washington
  • George Washington bridge
  • St. Frances Cathedral
  • St. Frances of Assisi
  • God
  • God!
  • The god
  • Jehovah
  • Allah
  • The Prophet
  • our Lord

Pause and think: Can you identify the matches for the Person concept in the examples above?

Matches for the Person concept include only the following:

  • [George Washington]
  • [St. Frances of Assisi]
  • [God]
  • [Jehovah]
  • [Allah]

The second and third examples are not matches for the Person concept because they are referring to locations, namely a bridge and a cathedral. The sixth example is an exclamation, whereas the remaining nonmatches are not specific enough to refer to one particular deity.

3.2.9. Animals, Fictional Characters, Artificial Intelligence, and Aliens

The proper names of animals, fictional characters, artificial intelligence, and aliens are matches.

Consider the following examples:

  • Mr. Ed the talking horse
  • Eevee (type of Pokemon creature)
  • Time Lord
  • E.T.
  • Martians
  • Vulcans
  • Baloo—Mowgli’s friend

Pause and think: Can you identify the matches in the examples above?

Matches include only the following:

  • [Mr. Ed] the talking horse
  • [E.T.]
  • [Baloo]—[Mowgli]’s friend

Matches do not include species, such as Eevee, Martians, or Vulcans, because they are groups.

3.2.10. Businesses Named after People

Names of humans, any of which could also be the name of a business, are included as matches to Person unless there is a contextual cue that the name applies to the business, not to the individual. Organization names with embedded person names are not included as matches.

Consider the following examples:

  • Dr Kelly Macgroarty
  • Steven L. Cox, CPA
  • Akram & Associates
  • Jaclyn Christie Podiatrists

Pause and think: Can you identify the matches for the Person concept in the examples above?

Matches for the Person concept include only the following:

  • [Dr Kelly Macgroarty]
  • [Steven L. Cox], CPA

Note that the third and fourth examples are not matches because context, such as “& Associates” and “Podiatrists,” identifies a business even though part of the company name may be a person name.

3.2.11. Laws, Diseases, Prizes, and Works of Art

  • The following situations are not included as matches:
  • Laws or legal acts named for people, such as “Dodd-Frank Act”
  • Diseases named for people, such as “Alzheimer’s”
  • Prizes named for people, such as “the Nobel Prize”
  • Works of art named for people, such as “The Birth of Venus”

3.3. Place

Place is a predefined concept provided by the SAS linguists. Note that the name of this concept in your product may be nlpPlace or another similar name. The generic “Place” label is used in this book, noting that “nlpPlace” and any concepts found in SAS products that have Location within their name are equivalent.

Place includes any proper name or defined expression commonly used to designate a specific site in the real or in an imaginary world, as well as any geo-political entity (GPE). Site includes any geographical point or area in physical space, on earth or elsewhere, including imaginary worlds. GPE is a composite of the following:

  • Population
  • Government
  • Physical location
  • Nation

For example, GPE includes province, state, county, city, town, and others.

Remember: Place includes any proper name or expression designating a specific site or geo-political entity in the real or imaginary world.

In addition to site names and GPE names, matches for Place include location expressions.
For example, matches include the following:

  • Postal address, crossroads, geographical coordinates expressed as longitude–latitude pairs, or military grid reference system (MGRS) coordinates
  • Names of continents
  • Regions that are subcontinental, transcontinental, subnational, or transnational
  • Nations or countries
  • States, provinces, cantons, counties, or district names
  • Cities, towns, villages, and hamlets
  • Clusters of GPEs that function as political entities
  • Airport names and official codes
  • Highways, street names, bridge names, and road names
  • Street addresses (postal and crossroads)
  • Fictional or mythological geographical locations
  • Geographical coordinates expressed as longitude–latitude pairs or as MGRS coordinates
  • Named geographical features, including mountain ranges and bodies of water
  • Park names

Words that are leveraged to indicate a potential match for Place are the following:

  • Locative prepositions
  • Verbal constructions that indicate nations or governments acting as people
  • Words indicating a type of location, like “planet,”nation,” and “government

Special cases that govern whether certain words are included in the match are described in the following sections.

3.3.1. Common Nouns and Determiners

Common nouns may be included in the name if they help clarify the concept or are truly treated in language and by societal conventions as a predefined concept, whether capitalized or not. Determiners like English “a” or “the” may also be included if they are considered a part of the name. For example, the determiner is included in the match of “[Democratic Republic of the Congo]” but not in “the [Southeastern United States].”

Consider the following examples:

  • In the river
  • The river Seine
  • The Amazon River
  • The Ruhr valley
  • Through the valley
  • Mississippi River west bank
  • The disputed area of Jordan’s West Bank
  • The Hague

Pause and think: Can you identify the matches in the examples above?

Matches include only the following:

  • The [river Seine]
  • The [Amazon River]
  • The [Ruhr valley]
  • [Mississippi River] west bank
  • The disputed area of [Jordan’s West Bank]
  • [The Hague]

The first and fifth examples do not produce a match because they do not include a proper noun. Note that the determiner is included as part of the match only in the final example.

3.3.2. Subnational Regions and Other Descriptors

Subnational regions are not included when referenced by only compass-point modifiers; generally, there needs to be enough information in the text explicitly that the location could be plotted or an area drawn on a map. Historic modifiers and other descriptors are included only if they are part of the official name.

Consider the following examples:

  • South America
  • The Southeastern United States
  • South Pacific
  • The South
  • The Southwest region
  • The mid-West
  • Former Soviet Union
  • Former Yugoslav Republic of Macedonia
  • Ivory Coast
  • The coast of Hawaii
  • Eastern North Dakota

Pause and think: Can you identify the matches in the examples above?

Matches include the following:

  • [South America]
  • The [Southeastern United States]
  • [South Pacific]
  • Former [Soviet Union]
  • [Former Yugoslav Republic of Macedonia]
  • [Ivory Coast]
  • The coast of [Hawaii]
  • Eastern [North Dakota]

Examples like “the South” and “the Southwest region” are not specific enough to be able to be pinpointed on a map, because they could refer to locations in various countries. Note that adjectives such as “former” or nouns such as “coast” are not included in the match when they are a historical or geographical reference, but are included if they are part of the official name of a country.

3.3.3. Street Addresses

Street addresses are included if they contain enough information to identify a specific point on a street or to zero in on a specific building or multi-structure facility with some background information about country and city/town/province as assumed knowledge. For the match to be a Place, it has to be able to be found on a map without guesswork.

Consider the following examples:

  • 123 Main Str., Raleigh, NC
  • 123 Main
  • Empire State Building
  • The Bank of America Tower in NYC
  • Disney World
  • Disneyland Paris
  • The Eiffel Tower
  • The North Carolina Museum of Art

Pause and think: Can you identify the matches in the examples above?

Matches include the following:

  • [123 Main Str., Raleigh, NC]
  • [Empire State Building]
  • The [Bank of America Tower in NYC]
  • [Disney World]
  • [Disneyland Paris]
  • The [Eiffel Tower]

The remaining two examples are not matches for the Place concept, because the context is not specific enough. The references could be to an organization rather than a place.

3.3.4. Monuments

Monuments that are not aliases for organizations running them are included as matches. All other facilities or buildings are excluded unless they are an airport or they fit the criteria for address.

Consider the following examples:

  • The Great Wall of China
  • The Eiffel Tower
  • Mt. Rushmore
  • . . . said the White House
  • The Vatican
  • The North Carolina Museum of Art

Pause and think: Can you identify the matches for the Place concept in the examples above?

Matches for the Place concept include only the following:

  • The [Great Wall of China]
  • The [Eiffel Tower]
  • [Mt. Rushmore]

The remaining three examples contain matches for the Organization concept.

3.3.5. Celestial Bodies

Names of heavenly bodies and locations are matches so long as the reference is to a specific heavenly body. Consider the following examples:

  • Our sun
  • The moon’s glow
  • The smartest person on earth
  • A sun like ours
  • Any moon will glow
  • Waste and earth being trucked
  • . . . earth-like
  • Welcome to the afterlife
  • Earthy old knowledge
  • The dead go to heaven or hell or sometime to Limbo
  • From here to Pluto

Pause and think: Can you identify the matches in the examples above?

Matches include only the following:

  • Our [sun]
  • The [moon]’s glow
  • The smartest person on [earth]
  • The dead go to [heaven] or [hell] or sometime to [Limbo]
  • From here to [Pluto]

In the remaining examples, the references are not to specific celestial objects; therefore, no matches are extracted to the Place concept.

3.3.6. Neighborhoods

Names of neighborhoods are included, but generic references to parts of cities or towns are not matches. Consider the following examples:

  • The Bronx
  • Midland Beach
  • Bay Terrace
  • Lower Manhattan
  • The northernmost borough of NYC
  • South NYC

Pause and think: Can you identify the matches in the examples above?

Matches include the following:

  • The [Bronx]
  • [Midland Beach]
  • [Bay Terrace]
  • [Lower Manhattan]
  • The northernmost borough of [NYC]
  • South [NYC]

Note that the cardinal points are not included in the matches in these examples.

3.3.7. Fictional Place Names

Fictional and nonphysical places with names are considered a match so long as the reference is to a specific place. If the reference is generic, it is not a match.

Consider the following examples:

  • . . . paradise
  • . . . fantasyland
  • La La Land
  • Oz
  • Camelot
  • The Garden of Eden
  • Tatooine

Pause and think: Can you identify the matches in the examples above?

Matches include only the following:

  • [La La Land]
  • [Oz]
  • [Camelot]
  • The [Garden of Eden]
  • [Tatooine]

The first two examples are not proper nouns and therefore not matches. The remaining examples are matches because they name specific locations.

3.3.8. Conjoined Location Names

When more than one location name in a row is encountered, they are considered one Place match if the relationship between them is hierarchical and they are adjacent or separated by punctuation or prepositions that establish the hierarchical relationship. They are also considered one Place match if the location names are conjoined or listed with elision. Leading prepositions are not included in the match.

Consider the following examples:

  • Dallas, TX
  • Frankfurt, Germany
  • . . . in Orlando and Miami, Florida
  • . . . across the Pacific or Atlantic Oceans
  • I went to Dayton, Ohio and then to Columbus
  • . . . came from Dayton, Ohio and not from Columbus, Ohio

Pause and think: Can you identify the matches in the examples above?

Matches include the following:

  • [Dallas, TX]
  • [Frankfurt, Germany]
  • . . . in [Orlando and Miami, Florida]
  • . . . across [the Pacific or Atlantic Oceans]
  • I went to [Dayton, Ohio and then to Columbus]
  • . . . came from [Dayton, Ohio] and not from [Columbus, Ohio]

Note that the leading prepositions are not included in the match and that only the final example produces two matches because of intervening text.

3.3.9. Special Cases for Nonmatches

Special cases that are excluded from matches as Place are as follows:

  • A city, state, or district name used to refer to a sports team (For example, in the sentence “Boston defeated Cleveland,” both “Boston” and “Cleveland” are categorized as Organization rather than Place, because the reference is intended to be to the sports teams, not the cities.)
  • Names of artifacts or products or services of organizations, including names of newspapers, websites, and media broadcasts (See section 3.4 for potentially using these names as references to Organizations.)
  • Names of purely digital locations, computer memory, websites, or tools, like the Dark Web, the Internet, and Wikipedia
  • Location names embedded in person names, organization names, or times
  • Names of gulags, forced labor camps, or other similar facilities
  • Adjectival forms of location or language names, such as “French cuisine” and “in Japanese”
  • Works of art with location names embedded in their names, such as “Washington Crossing the Delaware”
  • Names for the people from a location, unless the name also happens to describe the location (For example, “Americans,” “Aussies,” “the British,” “Chinese,” and “English” are not matches.)

3.4. Organization

Organization is a predefined concept provided by the SAS linguists. Note that the name of this concept in your product may be nlpOrganization or another similar name. The generic “Organization” label is used in this book because it is an industry standard term and reflects previous names used in SAS products for this concept.

Organization means a formally established association. The matches for Organization include the proper names, common aliases, nicknames, or stock ticker symbols of businesses, government units, sports teams, clubs, and formally organized artistic groups. Common types of organizations are as follows:

  • Stock exchanges
  • Specifically named military organizations, including armies, navies, and special forces
  • Paramilitary organizations, governing bodies, and government departments
  • Nongeneric names of parts of a government
  • Educational organizations
  • Commercial organizations
  • Entertainment organizations, bands, and performing groups
  • Media organizations and conglomerates
  • Political parties, advocacy groups, and think tanks
  • Professional regulatory and advocacy groups
  • Unions (but not names of various job types or categories)
  • Charitable organizations and nonprofits
  • International regulatory and political bodies
  • Religious organizations like denominations or guidance bodies (but not members of a particular religion, unless there is only one guidance body across the whole religion)
  • Medical-science organizations and research institutions
  • Organizations participating in or facilitating sporting and gaming events
  • Official clubs like [Toastmasters International], the [Masons], or [Alcoholics Anonymous]
  • Coalitions or alliances of governments
  • Multinational organizations

Examples of aliases, nicknames, and pseudonyms include the following:

  • [NYPD], an alias for the [New York Police Department]
  • [GOP], an alias for the [Republican Party]
  • [Big Blue], an alias for [IBM]

Matches also include stock ticker symbols, such as [MSFT] and [CSCO].

The proper names for groups of individuals closely associated with a specific organization are also considered matches. For example, [Girl Scouts] is a proper name associated with [Girl Scouts of America] and [Democrats] is a proper name associated with the [Democratic Party]. Generic names for a type of group or organization, like Latinos, feminists, police, or army, are not considered matches. But a specific proper name is a match; for example, the [Los Angeles Police Department] or [U.S. Congress].

In addition, organization names embedded in locations, such as AT&T Stadium, are not matches for Organization because they are referring to a location.

Remember: Organization means the name of a formally established association.

Words that are leveraged to indicate a potential match for Organization include prefixes and suffixes indicative of organizations, verbs associated with businesses or organizations acting like individuals, some prepositions (at, for, with, within, outside of), nouns for associated groups (team, division, chapter, orchestra, club), and facility words as part of the name.

  • Special cases that govern whether certain words are included in the match are described in the following sections.

3.4.1. Corporate Designators or Suffixes

Corporate designators or suffixes are included in the match. Consider the following examples:

  • Akram & Associates Inc.
  • Asset Management Partners Ltd.
  • Nanoscribe GmbH
  • OOO Stellberg

Pause and think: Can you identify the matches in the examples above?

Matches include the following:

  • [Akram & Associates Inc.]
  • [Asset Management Partners Ltd.]
  • [Nanoscribe GmbH]
  • [OOO Stellberg]

In all the examples, the various corporate designators are included in the matches.

3.4.2. Determiners before Proper Names

Determiners in front of proper names are included only if they are expected as part of the name. In the example of “The Ohio State University,” that university dictates that its name includes “the,” so the entire string ([The Ohio State University]) is the match. In contrast, in the text “the United Nations,” the determiner is not a part of the match (the [United Nations]).

3.4.3. Facility Names Associated with an Organization

Proper names referring to facilities which are closely associated with an organization that runs or owns the facility are included in the match, even if the facility itself is being referenced in a locative context. One exception is airports, which are not considered organizations.

Consider the following examples:

  • Reedy Creek Baptist Church
  • WakeMed Cary Hospital
  • The Vatican
  • The Empire State Building
  • The White House
  • The Trump Tower
  • Westminster Abbey
  • Stanford University
  • RDU airport
  • Disney World

Pause and think: Can you identify the matches for Organization in the examples above?

Matches for the Organization concept include only the following:

  • [Reedy Creek Baptist Church]
  • [WakeMed Cary Hospital]
  • The [Vatican]
  • The [White House]
  • [Westminster Abbey]
  • [Stanford University]

Airports and organization names embedded in locations are not matches, which disqualifies the remaining examples from matching for the Organization concept.

3.4.4. Groups of Individuals

Named groups of individuals with a codified and widely accepted set of criteria for membership in the group are included if they are closely associated with a single specific named organization. Consider the following examples:

  • Christians
  • Muslims
  • Jews
  • Buddhists
  • Boy Scouts of America
  • Girl Scouts

Pause and think: Can you identify the matches in the examples above?

Matches include only the following:

  • [Boy Scouts of America]
  • [Girl Scouts]

The remaining examples denote religious groups and therefore are not matches for Organization.

3.4.5. Aliases

A city, state, or district name is included when it is used to refer to a sports team. This is a common example of metonymy, a type of alias.

Consider the following examples:

  • Boston vs. Cleveland
  • The teams met in Boston
  • They won in Cleveland
  • The Cleveland uniforms are blue

Pause and think: Can you identify the matches in the examples above?

Matches include only the following:

  • [Boston] vs. [Cleveland]
  • The [Cleveland] uniforms are blue

When an organization name and an alias are both present, they are considered two separate matches. Consider the following examples:

  • The Department of Justice (DOJ)
  • Apple (Apple Computers, Inc.) said yesterday . . .
  • University of North Carolina–Chapel Hill (UNC-CH)

Pause and think: Can you identify the matches in the examples above?

Matches include the following:

  • [The Department of Justice] ([DOJ])
  • [Apple] ([Apple Computers, Inc.]) said yesterday
  • [University of North Carolina–Chapel Hill] ([UNC-CH])

An organization name or alias that is an explicit reference to a product or brand is included. However, the reverse is not true: References to the products or brands themselves are not automatically matched as organizations. Ambiguous references to products or brands that cannot be discerned from context to be referring to the organization specifically are also not included.

Consider the following examples:

  • Coke is the real thing
  • Coke tried to buy out the competition
  • Honda Civic
  • I drive a Lexus
  • Apple iPhone
  • All iPhones
  • . . . on Google
  • I Googled that yesterday
  • Buy stock in Kleenex
  • I need a Kleenex

Pause and think: Can you identify the matches in the examples above?

Matches include the following:

  • [Coke] tried to buy out the competition
  • [Honda] Civic
  • I drive a [Lexus]
  • [Apple] iPhone
  • . . . on [Google]
  • Buy stock in [Kleenex]

The remaining examples are referring to products rather than organizations and are therefore not matches to Organization.

3.4.6. Conjoined Organization Names

Two or more conjoined or listed organization names are considered separate predefined concept matches, even if it looks like they may share elided material. In this case, the shortened name is considered an alias.

Consider the following examples:

  • . . . at Cisco and Microsoft Corporation
  • Cisco Systems and Apple Inc. are both headquartered in California

Pause and think: Can you identify the matches for the Organization concept in the examples above?

Matches include the following:

  • . . . at [Cisco] and [Microsoft Corporation]
  • [Cisco Systems] and [Apple Inc.] are both headquartered in California

In these examples, although the organization names are conjoined, they are separate matches.

3.4.7. Event Names

Event names are not considered organizations, but the committees and organizations that run the events are. Consider the following examples:

  • Super Bowl XXX
  • The NFL
  • The Olympics
  • The Olympic Committee
  • China Film Festival

Pause and think: Can you identify the matches for the Organization concept in the examples above?

Matches include only the following:

  • The [NFL]
  • The [Olympic Committee]

The remaining examples are not matches, because they are names of events.

3.4.8. Special Cases for Nonmatches

Special cases that are excluded from matches as Organization are as follows:

  • Organization names that are embedded in location names, such as “the Apple headquarters” and “the SAS campus”
  • Industrial sectors and industries or the people or jobs associated with them, such as “accountants,” “health insurance,” or “the medical profession”
  • Works of art with organization names embedded in their names, such as “Campbell’s Soup Cans”

3.5. Disambiguation of Matches

Accounting for situations in which one single predefined concept match or pattern could fall into multiple categories is one of the key challenges of named entity recognition. There are ambiguities between enamex entities because many proper nouns could be names of persons, organizations, or locations. Some examples are listed below.

“Duke” could be part of a Person match or an Organization match:

  • I met [Stanley Duke]
  • We are students at [Duke University]

“Washington” could be referring to a person or place, so it could be part of a Person or Place match:

  • [President George Washington] was there
  • Our capital is [Washington D.C.]

“Chelsea” could be a part of a Person match, Place match, or Organization match:

  • Their daughter is [Chelsea Clinton]
  • She was born in [Chelsea]
  • He played for [Chelsea club]

Ambiguities are also encountered between enamex and numex entities, as mentioned in chapter 4. In addition, the same text string could be a predefined concept match or not. For example, the acronym “NER” could stand for nucleotide excision repair (nonmatch) or the North-East Railway (Organization).

The SAS predefined concepts account for these types of ambiguity by leveraging contextual cues like common titles, professions, abbreviations, prefixes or suffixes, appositives, and nominal and verbal constructions.

Sometimes it is difficult to distinguish from context whether the reference is to a place or an organization, because of metonymy, meaning the use of one term as a stand-in for another. For example, sports teams (organizations) from a particular location are often referred to as that location, as in “Buffalo’s win over New York.” Similarly, the work of government officials or departments is sometimes referred to by the name of the location, as in “Germany unveils new law.” In these and other similar cases, the following predefined concept guidelines offer some direction.

3.5.1. Organization or Place

The following situations describe matches for the Organization concept:

  • A city, state, district, or country name used to refer to a sports team or government
  • Facilities or buildings that are aliases for organizations running them
  • A string containing an Organization followed by a street address or other location (the organization name is a match for the Organization concept)

The following situations describe matches for the Place concept:

  • Facilities or buildings that are not aliases for organizations running them
  • An airport or location that aligns with the definition of an address match, in that it identifies a place that can be plotted on a map
  • An organization name that is embedded in a location name (there are no overlapping matches for two separate predefined concept types, so the entire location name matches only as Place)
  • A string containing an Organization followed by a street address or other location (the street address is a match for the Place concept)

Consider the following examples:

  • Boston vs. Cleveland
  • Croatia beat Slovakia
  • The Vatican
  • Germany unveils new law
  • The White House
  • Eiffel Tower
  • Westminster Abbey
  • Stanford University
  • Disney World
  • The SAS Executive Briefing Center
  • The Apple headquarters
  • RDU airport
  • Bank of America Branch at 5983 N. Lincoln Avenue in Chicago, IL 60659
  • The Macy’s on Main Street

Pause and think: Can you identify which of the examples above contain matches for the Organization concept and which ones for Place?

Matches for the Organization concept include the following:

  • [Boston] vs. [Cleveland]
  • [Croatia] beat [Slovakia]
  • The [Vatican]
  • [Germany] unveils new law
  • The [White House]
  • [Westminster Abbey]
  • [Stanford University]
  • [Bank of America] Branch at 5983 N. Lincoln Avenue in Chicago, IL 60659
  • The [Macy’s] on Main Street

Matches for the Place concept include the following:

  • [Eiffel Tower]
  • [Disney World]
  • The [SAS Executive Briefing Center]
  • The [Apple headquarters]
  • [RDU airport]
  • Bank of America Branch at [5983 N. Lincoln Avenue in Chicago, IL 60659]
  • The Macy’s on [Main Street]

3.5.2. Organization or Product

An organization name or alias that is an explicit reference to a product or brand is a match for Organization. However, references to the products or brands themselves and ambiguous references to products or brands that cannot be discerned from context to be referring to the organization specifically are not matches.

Consider the following examples:

  • Toyota Highlander
  • I drive a Porsche
  • Samsung Galaxy
  • . . . on Google maps
  • Johnson’s baby products
  • Johnson’s babies
  • She can’t find her Chapstick
  • Chapstick® Classic Lip Balm

Pause and think: Can you identify which of the examples above contain matches for the Organization concept?

Matches include the following:

  • [Toyota] Highlander
  • I drive a [Porsche]
  • [Samsung] Galaxy
  • … on [Google]
  • [Johnson’s] baby products
  • [Chapstick]® Classic Lip Balm

3.5.3. Organization or Person

Groups of individuals belonging to an organization match as Organization, such as [Democrats], [Girl Scouts], and [Marines]. However, groups of individuals who do not belong to a formally established association are not considered a match for Organization or Person. Thus, for example, members of a particular religion are not considered matching Organization, but members of a particular formally established religious denomination or church may be.

Consider the following examples:

  • Christians
  • Baptists
  • Sunni
  • Muslims
  • Shia

Pause and think: Can you identify which of the examples above contain matches for the Organization concept?

Matches include the following:

  • [Baptists]
  • [Sunni]
  • [Shia]

Groups of individuals belonging to a particular industrial sector, industry, or job are not considered matches because they are not proper nouns. For example, the job description “financial advisors” is not a match for Person, but “[Bank of America] financial advisors” contains an Organization predefined concept match—the company where that group of individuals works.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset