findGenes: Find gene names

Description Usage Arguments Details Value Note Author(s) Examples

View source: R/findGenes.R

Description

Find gene names in PMC text or tables using pattern matching

Usage

1
findGenes(txt)

Arguments

txt

A PMC txt or table object

Details

Find genes names in pmcText or pmcTable output. PMC text should be split into sentences

Value

A data.frame with id, source, gene and mention

Note

Matches words with second and third letter lower case and fourth letter upper case. This does not find three letter gene names

Author(s)

Chris Stubben

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
## Not run: 
doc <- pmcOAI("PMC2231364" )
txt <-pmcText(doc)
y <-findGenes(txt)
table(y$gene)

## italics (inlcuding tables)
x <- gdata::trim( xpathSApply(doc, "//body//italic", xmlValue) )
## most of these are not genes...
table( x[nchar(x)==3] )

## End(Not run)

cstubben/pmcXML documentation built on May 14, 2019, 12:25 p.m.