Consider a speaker, who uses the term allow multiple times throughout the speech, compared to an another speaker who uses terms allow, concur, acquiesce, accede, and avow for the same word. The latter speech has more lexical diversity than the former. Lexical diversity is widely believed to be an important parameter to rate a document in terms of textual richness and effectiveness.
Lexical diversity, in simple terms, is a measurement of the breadth and variety of vocabulary used in a document. The different measures of lexical diversity are TTR, MSTTR, MATTR, C, R, CTTR, U, S, K, Maas, HD-D, MTLD, and MTLD-MA.
koRpus package in R provides functions to estimate the lexical diversity or complexity.
If N is the total number of tokens and V is the number of types:
Measure |
Description |
Wrapper Function (koRpus package in R) |
---|---|---|
TTR |
Type-Token Ratio |
TTR |
MSTTR |
Mean segment type token ratio |
MSTTR |
C |
logTTR |
C.ld |
R |
Root TTR |
R.ld |
CTTR |
Corrected TTR |
CTTR |
U |
Uber Index |
U.ld |
S |
Summer index |
S.ld |
This function provides all the lexical diversity measure characteristics as described previously. If you are only interested in estimating one of the measures, then you can use the wrapper functions as mentioned in table instead of lex.div
:
Library(koRpus) Lex.div(tagged.txt)) ttr.res <- TTR(tagged.text, char=TRUE)
This function is truncated version of lex.div, as argument it just requires the number of token and types and calculates the lexical diversity. Lexical diversity measures like TTR, C, R, CTTR, U, S, and Maas can be estimated by using this function:
lex.div.num(N, V)
Readability provides quantitative measures to analyze the complexity and quality of a text document.
The function does not count the syllables, when the parameter is specified as "NRI", navy Readability index is calculated while if it set to "simple", simplified formula is calculated.
Apart from ARI, koRpus package provides different functions for readability analysis like bormuth, Degree of Reading Power(DRP), Easy Listening Formula(ELF), dickes.steiwer
, danielson.bryan
, dale.chall
to estimate different readability indices.