GuessLanguagePipe: Class to guess the language of an Instance

GuessLanguagePipeR Documentation

Class to guess the language of an Instance

Description

This class allows guess the language by using language detector of library cld2. Creates the language property which indicates the idiom text. Optionally, it is possible to choose the language provided by Twitter.

Details

To obtain the language of the tweets, it will be verified that there is a json file with the information stored in memory. On the other hand, it is necessary define the "cache.twitter.path" field of bdpar.Options variable to know where the information of tweets are saved.

Note

The Pipe will invalidate the Instance if the language of the data can not be detect.

Inherit

This class inherits from GenericPipe and implements the pipe abstract function.

Super class

bdpar::GenericPipe -> GuessLanguagePipe

Methods

Public methods

Inherited methods

Method new()

Creates a GuessLanguagePipe object.

Usage
GuessLanguagePipe$new(
  propertyName = "language",
  alwaysBeforeDeps = list("StoreFileExtPipe", "TargetAssigningPipe"),
  notAfterDeps = list(),
  languageTwitter = TRUE
)
Arguments
propertyName

A character value. Name of the property associated with the GenericPipe.

alwaysBeforeDeps

A list value. The dependencies alwaysBefore (GenericPipes that must be executed before this one).

notAfterDeps

A list value. The dependencies notAfter (GenericPipes that cannot be executed after this one).

languageTwitter

A logical value. Indicates whether for the Instances of type twtid the language that returns the API is obtained or the detector is applied.


Method pipe()

Preprocesses the Instance to obtain the language of the data.

Usage
GuessLanguagePipe$pipe(instance)
Arguments
instance

A Instance value. The Instance to preprocess.

Returns

The Instance with the modifications that have occurred in the pipe.


Method getLanguage()

Guesses the language of data.

Usage
GuessLanguagePipe$getLanguage(data)
Arguments
data

A character value. The text to guess the ç language.

Returns

The language guesser. Format: see ISO 639-3:2007.


Method clone()

The objects of this class are cloneable with this method.

Usage
GuessLanguagePipe$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

See Also

AbbreviationPipe, bdpar.Options, ContractionPipe, File2Pipe, FindEmojiPipe, FindEmoticonPipe, FindHashtagPipe, FindUrlPipe, FindUserNamePipe, GuessDatePipe, Instance, InterjectionPipe, MeasureLengthPipe, GenericPipe, SlangPipe, StopWordPipe, StoreFileExtPipe, TargetAssigningPipe, TeeCSVPipe, ToLowerCasePipe


bdpar documentation built on Aug. 22, 2022, 5:08 p.m.