readGEOAnn: Function to extract data from the GEO web site

View source: R/readGEOAnn.R

readGEOAnnR Documentation

Function to extract data from the GEO web site

Description

Data files that are available at GEO web site are identified by GEO accession numbers. Given the url for the CGI script at GEO and a GEO accession number, the functions extract data from the web site and returns a matrix containing the data.

Usage

readGEOAnn(GEOAccNum, url = "https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?")
readIDNAcc(GEOAccNum, url = "https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?")
getGPLNames(url ="https://www.ncbi.nlm.nih.gov/geo/query/browse.cgi?") 
getSAGEFileInfo(url =
                       "https://www.ncbi.nlm.nih.gov/geo/query/browse.cgi?view=platforms&prtype=SAGE&dtype=SAGE")
getSAGEGPL(organism = "Homo sapiens", enzyme = c("NlaIII", "Sau3A"))
readUrl(url)

Arguments

url

url the url for the CGI script at GEO

GEOAccNum

GEOAccNum a character string for the GEO accession number of a desired file (e. g. GPL97)

organism

organism a character string for the name of the organism of interests

enzyme

enzyme a character string that can be eighter NlaII or Sau3A for the enzyme used to create SAGE tags

Details

url is the CGI script that processes user's request. readGEOAnn invokes the CGI by passing a GEO accession number and then processes the data file obtained.

readIDNAcc calls readGEOAnn to read the data and the extracts the columns for probe ids and accession numbers. The GEOAccNum has to be the id for an Affymetrix chip.

getGPLNames parses the html file that lists GEO accession numbers and descriptions of the array represented by the corresponding GEO accession numbers.

Value

Both readGEOAnn and readIDNAcc return a matrix.

getGPLNames returns a named vector of the names of commercial arrays. The names of the vector are the corresponding GEO accession number.

Author(s)

Jianhua Zhang

References

www.ncbi.nlm.nih.gov/geo

Examples

# Get array names and GEO accession numbers
#geoAccNums <- getGPLNames()
# Read the annotation data file for HG-U133A which is GPL96 based on
# examining geoAccNums 
#temp <- readGEOAnn(GEOAccNum = "GPL96")
#temp2 <- readIDNAcc(GEOAccNum = "GPL96")

Bioconductor/annotate documentation built on Nov. 2, 2024, 4:40 p.m.