annotate: 'annotate'

Description Usage Arguments Value Examples

Description

This method annotates the entities contained in a data frame with the concepts from a specific dictionary.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
annotate(input = NA, dictType = NA, dictionary = NA, ...)

## S4 method for signature 'data.frame,character,character'
annotate(input,
  dictType = "OBO", dictionary = NA, dictoutdir = getwd(),
  d_synonymtype = "EXACT", taxID = 0, annot_out = getwd(),
  paramValueIndex = NA, SearchStrategy = "CONTIGUOUS_MATCH",
  CaseMatch = "CASE_INSENSITIVE", Stemmer = "NONE",
  StopWords = "NONE", OrderIndependentLookup = "ON",
  FindAllMatches = "YES", e_synonymtype = "ALL",
  multipleDocs = FALSE, disease = FALSE)

Arguments

input

A data frame where the first column is the ID of the sample or document to annotate

dictType

the type of input dictionary

OBO

A dictionary that has been created by An OBO file

ENTREZ

Entrez genes dictionary

TARGET

Entrez genes dictionary, Histone marks and Histone modifications

CMDICT

A previously created dictionary file in the Conceptmapper XML format

dictionary

The local OBO/OWL ontology to be converted into an XML Conceptmapper dictionary or the URL to download the file. If NA is passed and the dicType parameter is not the default OBO then the method tries to download the corresponding dictionary from the available repositories. For ENTREZ and TARGET dictionary types a file named gene_info.gz can be automatically downloaded from ftp://ncbi.nlm.nih.gov/gene/data/gene_info.gz if its path is not provided by the user in this parameter. Alternatively an annotation package of the type Org.xx.eg.db from Bioconductor can be used. In this case the gene identifiers and their alternative names will be retrieved from the annotation database without the need of downloading a gene_info file.

...

Optional parameters

dictoutdir

Optional parameter to specify the location where the Conceptmapper dictionary file will be stored. Defaults to current working directory.

d_synonymtype

Optional parameter to specify the type of synonyms to consider when building the dictionary for Conceptmapper. For further detail http://owlcollab.github.io/oboformat/doc/obo-syntax.html. Default: EXACT

EXACT
ALL
taxID

the taxonomy identifier of the organism when the dictType = 'ENTREZ' or 'TARGET' and the dictionary parameter refers to a gene_info.gz file. If 0 all the taxonomies will be included in the new dictionary.

annot_out

The path of the output directory where Conceptmapper annotation files will be stored

paramValueIndex

An integer value to index the 576 parameter combinations

SearchStrategy

The matching strategy for finding concepts in the input text

  • CONTIGUOUS_MATCHLongets match of contiguous tokens within enclosing span

  • SKIP_ANY_MATCHLongest match of not-necessarily contiguous tokens

  • SKIP_ANY_MATCH_ALLOW_OVERLAPLongest match of not-necessarily contiguous tokens, overlapping matches are allowed

CaseMatch
  • CASE_IGNOREFold everything to lowercase for matching

  • CASE_INSENSITIVEFold only tokens with initial caps to lowercase

  • CASE_FOLD_DIGITSFold all (and only) tokens with a digit

  • CASE_SENSITIVEPerform no case folding

Stemmer
  • BIOLEMMATIZER A stemmer specific for biomedical literature

  • PORTER A stemmer that removes the commoner morphological and inflexional endings from words in English

  • NONE No word stemming

StopWords
  • PUBMED A list of stop words obtained analyzing Pubmed papers

  • NONE No stop words

OrderIndependentLookup
  • ON Ordering within span is ignored (i.e. 'Breast cancer' would equal 'Cancer breast')

  • OFF Ordering is taken into consideration

FindAllMatches
  • YES All the matches within the span are found

  • NO Only the longest match within the span will be returned

e_synonymtype

The type of synoyms for the EntityFinder

  • EXACT_ONLY Only exact synonyms are considered

  • ALL All synonym types are included

multipleDocs

TRUE when multiple documents are loaded from a single file with each row representing a document. The file should have two columns. The first for the unique document identifier and the second for the textual descriptions

disease

A logical value set to TRUE if the annotation requires the 'Healthy' condition to be found.

Value

instance of class Onassis-class with annotated entities

Examples

1
2
3
4
5
6
7
geo_chip <- readRDS(system.file('extdata', 'vignette_data',
'GEO_human_chip.rds', package='Onassis'))

obo <- system.file('extdata', 'sample.cs.obo', package='OnassisJavaLibs')
onassis_results <- annotate(geo_chip, 'OBO', dictionary=obo)
entities <- entities(onassis_results)
entities <- entities[sample(nrow(entities), 30),]

Onassis documentation built on Nov. 8, 2020, 8:18 p.m.