Language detection

TextCat is a text classification utility. The primary usage of TextCat is language identification. textcat package in R provides wrapper function for n-gram based text categorization and the language detection. It can detect up to 75 languages:

Library(textcat)>my.profiles <- TC_byte_profiles[names(TC_byte_profiles)]
>my.profiles

A textcat profile db of length 75.

> my.text <- c("This book is in English language",
 "Das ist ein deutscher Satz.",
 "Il s'agit d'une phrase française.",
 "Esta es una frase en espa~nol.")
 textcat(my.text, p = my.profiles)
> textcat(my.text, p = my.profiles)

[1] "english" "german"  "french"  "spanish"
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset