Home

/

CRAN

/

MetaNLP

/

MetaNLP: Natural Language Processing for Meta Analysis

MetaNLP: Natural Language Processing for Meta Analysis
In MetaNLP: Natural Language Processing for Meta Analysis

View source: R/MetaNLP.R

MetaNLP

R Documentation

Natural Language Processing for Meta Analysis

Description

The MetaNLP package provides methods to quickly transform a CSV-file with titles and abstracts to an R data frame that can be used for automatic title-abstract screening using machine learning.

A MetaNLP object is the base class of the package MetaNLP. It is initialized by passing the path to a CSV file and constructs a data frame whose column names are the words that occur in the titles and abstracts and whose cells contain the word frequencies for each paper.

Usage

MetaNLP(
  file,
  bounds = c(2, Inf),
  word_length = c(3, Inf),
  language = "english",
  ...
)

Arguments

`file`	Either the path to the CSV file or a data frame containing the abstracts
`bounds`	An integer vector of length 2. The first value specifies the minimum number of appearances of a word to become a column of the word count matrix, the second value specifies the maximum number. Defaults to `c(2, Inf)`.
`word_length`	An integer vector of length 2. The first value specifies the minimum number of characters of a word to become a column of the word count matrix, the second value specifies the maximum number. Defaults to `c(3, Inf)`.
`language`	The language for lemmatization and stemming. Supported languages are `english`, `french`, `german`, `russian` and `spanish`. For non-english languages make sure that the csv which is processed has the correct encoding.
`...`	Additional arguments passed on to `read.csv2`, e.g. when "," should be used as a separator or when the encoding should be changed. See read.table.

Details

An object of class MetaNLP contains a slot data_frame where the document-term matrix is stored as a data frame. The CSV file must have a column ID to identify each paper, a column title with the belonging titles of the papers and a column abstract which contains the abstracts. If the CSV stores training data, a column decision should exist, indicating whether an abstract is included in the meta analysis. This column does not need to exist, because there is no decision for test data yet. Allowed values in this column are either "yes" and "no" or "include" and "exclude" or "maybe". The value "maybe" is handled as a "yes"/"include".

Value

An object of class MetaNLP

Note

To ensure correct processing of the data when there are special characters (e.g. "é" or "ü"), make sure that the csv-file is correctly encoded as UTF-8. The stemming algorithm makes use of the C libstemmer library generated by Snowball. When german texts are stemmed, umlauts are replaced by their non-umlaut equivalent, so "ä" becomes "a" etc.

Author(s)

Maintainer: Maximilian Pilz maximilian.pilz@itwm.fraunhofer.de (ORCID)

Authors:

Nico Bruder brudernico@gmail.com (ORCID)
Samuel Zimmermann zimmermann@imbi.uni-heidelberg.de (ORCID)
Johannes Vey vey@imbi.uni-heidelberg.de (ORCID)

Other contributors:

Institute of Medical Biometry - University of Heidelberg [copyright holder]

Examples

path <- system.file("extdata", "test_data.csv", package = "MetaNLP", mustWork = TRUE)
obj <- MetaNLP(path)

MetaNLP documentation built on April 4, 2025, 5:11 a.m.

MetaNLP index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.