annotate: 'annotate'
In Onassis: OnASSIs Ontology Annotation and Semantic SImilarity software

Description Usage Arguments Value Examples

This method annotates the entities contained in a data frame with the concepts from a specific dictionary.

annotate(input = NA, dictType = NA, dictionary = NA, ...)

## S4 method for signature 'data.frame,character,character'
annotate(input,
  dictType = "OBO", dictionary = NA, dictoutdir = getwd(),
  d_synonymtype = "EXACT", taxID = 0, annot_out = getwd(),
  paramValueIndex = NA, SearchStrategy = "CONTIGUOUS_MATCH",
  CaseMatch = "CASE_INSENSITIVE", Stemmer = "NONE",
  StopWords = "NONE", OrderIndependentLookup = "ON",
  FindAllMatches = "YES", e_synonymtype = "ALL",
  multipleDocs = FALSE, disease = FALSE)

`input`	A data frame where the first column is the ID of the sample or document to annotate
`dictType`	the type of input dictionary OBO A dictionary that has been created by An OBO file ENTREZ Entrez genes dictionary TARGET Entrez genes dictionary, Histone marks and Histone modifications CMDICT A previously created dictionary file in the Conceptmapper XML format
`dictionary`	The local OBO/OWL ontology to be converted into an XML Conceptmapper dictionary or the URL to download the file. If NA is passed and the `dicType` parameter is not the default OBO then the method tries to download the corresponding dictionary from the available repositories. For ENTREZ and TARGET dictionary types a file named gene_info.gz can be automatically downloaded from ftp://ncbi.nlm.nih.gov/gene/data/gene_info.gz if its path is not provided by the user in this parameter. Alternatively an annotation package of the type `Org.xx.eg.db` from Bioconductor can be used. In this case the gene identifiers and their alternative names will be retrieved from the annotation database without the need of downloading a gene_info file.
`...`	Optional parameters
`dictoutdir`	Optional parameter to specify the location where the Conceptmapper dictionary file will be stored. Defaults to current working directory.
`d_synonymtype`	Optional parameter to specify the type of synonyms to consider when building the dictionary for Conceptmapper. For further detail http://owlcollab.github.io/oboformat/doc/obo-syntax.html. Default: EXACT EXACT ALL
`taxID`	the taxonomy identifier of the organism when the `dictType` = 'ENTREZ' or 'TARGET' and the `dictionary` parameter refers to a gene_info.gz file. If 0 all the taxonomies will be included in the new dictionary.
`annot_out`	The path of the output directory where Conceptmapper annotation files will be stored
`paramValueIndex`	An integer value to index the 576 parameter combinations
`SearchStrategy`	The matching strategy for finding concepts in the input text CONTIGUOUS_MATCHLongets match of contiguous tokens within enclosing span SKIP_ANY_MATCHLongest match of not-necessarily contiguous tokens SKIP_ANY_MATCH_ALLOW_OVERLAPLongest match of not-necessarily contiguous tokens, overlapping matches are allowed
`CaseMatch`	CASE_IGNOREFold everything to lowercase for matching CASE_INSENSITIVEFold only tokens with initial caps to lowercase CASE_FOLD_DIGITSFold all (and only) tokens with a digit CASE_SENSITIVEPerform no case folding
`Stemmer`	BIOLEMMATIZER A stemmer specific for biomedical literature PORTER A stemmer that removes the commoner morphological and inflexional endings from words in English NONE No word stemming
`StopWords`	PUBMED A list of stop words obtained analyzing Pubmed papers NONE No stop words
`OrderIndependentLookup`	ON Ordering within span is ignored (i.e. 'Breast cancer' would equal 'Cancer breast') OFF Ordering is taken into consideration
`FindAllMatches`	YES All the matches within the span are found NO Only the longest match within the span will be returned
`e_synonymtype`	The type of synoyms for the EntityFinder EXACT_ONLY Only exact synonyms are considered ALL All synonym types are included
`multipleDocs`	TRUE when multiple documents are loaded from a single file with each row representing a document. The file should have two columns. The first for the unique document identifier and the second for the textual descriptions
`disease`	A logical value set to TRUE if the annotation requires the 'Healthy' condition to be found.

instance of class Onassis-class with annotated entities

geo_chip <- readRDS(system.file('extdata', 'vignette_data',
'GEO_human_chip.rds', package='Onassis'))

obo <- system.file('extdata', 'sample.cs.obo', package='OnassisJavaLibs')
onassis_results <- annotate(geo_chip, 'OBO', dictionary=obo)
entities <- entities(onassis_results)
entities <- entities[sample(nrow(entities), 30),]