MetaNLP | R Documentation |
The MetaNLP package provides methods to quickly transform a CSV-file with titles and abstracts to an R data frame that can be used for automatic title-abstract screening using machine learning.
A MetaNLP
object is the base class of the package MetaNLP.
It is initialized by passing the path to a CSV file and constructs
a data frame whose column names are the words that occur in the titles
and abstracts and whose cells contain the word frequencies for each
paper.
MetaNLP(
file,
bounds = c(2, Inf),
word_length = c(3, Inf),
language = "english",
...
)
file |
Either the path to the CSV file or a data frame containing the abstracts |
bounds |
An integer vector of length 2. The first value specifies
the minimum number of appearances of a word to become a column of the word
count matrix, the second value specifies the maximum number.
Defaults to |
word_length |
An integer vector of length 2. The first value specifies
the minimum number of characters of a word to become a column of the word
count matrix, the second value specifies the maximum number.
Defaults to |
language |
The language for lemmatization and stemming. Supported
languages are |
... |
Additional arguments passed on to |
An object of class MetaNLP
contains a slot data_frame where
the document-term matrix is stored as a data frame.
The CSV file must have a column ID
to identify each paper, a column
title
with the belonging titles of the papers and a column
abstract
which contains the abstracts. If the CSV stores training data,
a column decision
should exist, indicating whether an abstract
is included in the meta analysis. This column does not need to exist, because
there is no decision for test data yet. Allowed values in this column are
either "yes" and "no" or "include" and "exclude" or "maybe". The value "maybe"
is handled as a "yes"/"include".
An object of class MetaNLP
To ensure correct processing of the data when there are special characters
(e.g. "é" or "ü"), make sure that the csv-file is correctly encoded
as UTF-8
.
The stemming algorithm makes use of the C libstemmer library generated by
Snowball. When german texts are stemmed, umlauts are replaced by their
non-umlaut equivalent, so "ä" becomes "a" etc.
Maintainer: Maximilian Pilz maximilian.pilz@itwm.fraunhofer.de (ORCID)
Authors:
Nico Bruder brudernico@gmail.com (ORCID)
Samuel Zimmermann zimmermann@imbi.uni-heidelberg.de (ORCID)
Johannes Vey vey@imbi.uni-heidelberg.de (ORCID)
Other contributors:
Institute of Medical Biometry - University of Heidelberg [copyright holder]
Useful links:
Report bugs at https://github.com/imbi-heidelberg/MetaNLP/issues
path <- system.file("extdata", "test_data.csv", package = "MetaNLP", mustWork = TRUE)
obj <- MetaNLP(path)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.