tox_doc_stats: Text Statistics for Documents

Description Usage Arguments Value

Description

This function is part of the 'text operation' (tox) dunction set
This function counts pages, words and characters, it determines the filesize and calculates ratios. (optional the language of the text is returned)

Usage

1
tox_doc_stats(text, lan = c("single", "mixed"))

Arguments

text

a character string of texts

lan

If the language of the text is unknown it can be determined

Value

A Dataframe with the following columns: doc_id: the filename of the text file
n_page: number of pages
n_word: number of words
n_char: number of character
r_word_page: ratio of words per page
r_char_word: ratio of characters per word
size: size of the text file in KB
lan_1: first detected language (optional)
lan_2: second detected language (optional)
rel_1: reliability of the first detected language (optional)
rel_2: reliability of the second detected language (optional)


M-U-UNI-MA/tpfunctions documentation built on May 24, 2019, 7:37 a.m.