Detects the languages used in documents.

Share:

Description

This function returns the language detected in a sentence or documents along with a confidence score between 0 and 1. A scores equal to 1 indicates 100

Internally, this function invokes the Microsoft Cognitive Services Text Analytics REST API documented at https://www.microsoft.com/cognitive-services/en-us/text-analytics/documentation.

You MUST have a valid Microsoft Cognitive Services account and an API key for this function to work properly. See https://www.microsoft.com/cognitive-services/en-us/pricing for details.

Usage

1
textaDetectLanguages(documents, numberOfLanguagesToDetect = 1L)

Arguments

documents

(character vector) Vector of sentences or documents on which to perform language detection.

numberOfLanguagesToDetect

(integer) Number of languages to detect. Set to 1 by default. Use a higher value if individual documents contain a mix of languages.

Value

An S3 object of the class texta. The results are stored in the results dataframe inside this object. The dataframe contains the original sentences or documents, the name of the detected language, the ISO 639-1 code of the detected language, and a confidence score. If an error occurred during processing, the dataframe will also have an error column that describes the error.

Author(s)

Phil Ferriere pferriere@hotmail.com

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
## Not run: 

 docsText <- c(
   "The Louvre or the Louvre Museum is the world's largest museum.",
   "Le musee du Louvre est un musee d'art et d'antiquites situe au centre de Paris.",
   "El Museo del Louvre es el museo nacional de Francia.",
   "Il Museo del Louvre a Parigi, in Francia, e uno dei piu celebri musei del mondo.",
   "Der Louvre ist ein Museum in Paris."
 )

 tryCatch({

   # Detect languages used in documents
   docsLanguage <- textaDetectLanguages(
     documents = docsText,           # Input sentences or documents
     numberOfLanguagesToDetect = 1L  # Number of languages to detect
   )

   # Class and structure of docsLanguage
   class(docsLanguage)
   #> [1] "texta"
   str(docsLanguage, max.level = 1)
   #> List of 3
   #>  $ results:'data.frame': 5 obs. of  4 variables:
   #>  $ json   : chr "{\"documents\":[{\"id\":\"B6e4C\",\"detectedLanguages\": __truncated__ }]}
   #>  $ request:List of 7
   #>   ..- attr(*, "class")= chr "request"
   #>  - attr(*, "class")= chr "texta"

   # Print results
   docsLanguage
   #> texta [https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/lan __truncated__ ]
   #>
   #> -----------------------------------------------------------
   #>             text               name    iso6391Name   score
   #> ----------------------------- ------- ------------- -------
   #>   The Louvre or the Louvre    English      en          1
   #> Museum is the world's largest
   #>            museum.
   #>
   #>   Le musee du Louvre est un    French      fr          1
   #>  musee d'art et d'antiquites
   #>   situe au centre de Paris.
   #>
   #>   El Museo del Louvre es el   Spanish      es          1
   #>  museo nacional de Francia.
   #>
   #> Il Museo del Louvre a Parigi, Italian      it          1
   #>   in Francia, e uno dei piu
   #>   celebri musei del mondo.
   #>
   #>  Der Louvre ist ein Museum in  German      de          1
   #>            Paris.
   #> -----------------------------------------------------------

 }, error = function(err) {

   # Print error
   geterrmessage()

 })

## End(Not run)