Description Usage Arguments Details
A tokenlist is a data.frame in which rows represent the tokens of a text (e.g., words, lemma, ngrams). This function creates a tokenlist that is ordered by document ('doc_id' column) and the position of the token in the text ('position' column).
1 2 3 4 5 |
x |
An object that can be transformed into a tokenlist object. This can be 1) a list of the tokenizedTexts class (quanteda). 2) A data.frame with document_id, position and word columns (see above for explanation of columnnames). Or 3) a character vector, in which case the tokenize function of the quanteda package is used. |
doc_id |
If the input is a tokenizedTexts list or character vector, the doc_id vector can be given to define document ids (otherwise, the list or vector indices are used) |
doc.col |
The name of the document_id column. Defaults to "doc_id", unless a global default is specified using setTokenlistColnames() |
position.col |
The name of the column giving the position in a document. Defaults to "position", unless a global default is specified using setTokenlistColnames() |
word.col |
The name of the column containing the token text. Defaults to "word", unless a global default is specified using setTokenlistColnames() |
... |
If x is a character vector, additional arguments will be passed to the tokenize function of the quanteda package |
The tokenization is taken care of by the tokenize function of the quanteda package. Additional arguments (...) are passed to the tokenize function.
The default column names for the tokenlist are "doc_id", "position" and "word". Functions in semnet where the tokenlist should be given as an argument assume that these column names are used. If alternative columnnames are prefered, these can be specified in two ways. First, the defaults can be set when calling a function using the doc.col, position.col and word.col parameters. Second, defaults can be set globally by using the setTokenlistColnames() function.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.