features: Features.

featuresR Documentation

Features.

Description

Module containing functions for individual simple text feature extraction.

Usage

features

Format

An object of class module (inherits from list) of length 19.

Details

Most functions have a single text parameter. The module contains the following functions:

Stopwords

Number of stopwords. Uses two optional parameters: Tokenize which is the word tokenizer to use and stopwords which is the list of stopwords to use.

Tokenize1

First tokenizer available for Stopwords.

Tokenize2

Second tokenizer available for Stopwords.

StopwordsRatio1

Ratio of stopwords using Tokenize1

StopwordsRatio2

Ratio of stopwords using Tokenize2

Caps

Number of uppercase letters.

CapsRatio

Ratio of uppercase letters.

SpecialChars

Number of special characters.

SpecialCharsRatio

Ratio of special characters.

Numbers

Number of digit characters.

NumbersRatio

Ratio of digit characters.

Words

Number of words.

AverageWordLength

Average word length.

LastCharCode

Boolean for the use of a code character at the end of the text.

LastCharNL

Boolean for the use of a natural language boolean at the end of the text.

First3Chars

Returns the first three non white characters.

First3CharsLetters

The number of three first non white characters that are letters.

Emoticons

The number of emoticons

StartWithAt

Boolean for the use of @ at the start of the text.


M3SOulu/NLoN documentation built on June 20, 2022, 6 p.m.