processData: Convert Data Format

Description Usage Arguments Details Value Author(s) References

Description

Convert data format by R function, or produce perl program to process data.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
getBaseParsers(baseMapType, db=FALSE)

fileMuncher(outName, dataFile, parser, organism)
fileMuncher_DB(dataFile, parser, organism)

writeInput(parser, perlName, organism, dataFile)
writeInputSP(perlName,organism)
writeInputIPI(perlName,organism)
writeInputREFSEQ(perlName,organism)
writeInputBLAST(perlName,organism, dataFile)
writeInputPFAM(perlName,organism)
writeInputINTERPRO(perlName,organism)
writeOutput(parser, perlName)
.callPerl(script, os)

getSrcObjs(srcUrls, organism, built, fromWeb = TRUE)
getBaseData(srcObjs)

splitEntry(dataRow, sep = ";", asNumeric = FALSE)
twoStepSplit(dataRow, entrySep = ";", eleSep = "@", asNumeric = FALSE)
mergeRowByKey(mergeMe, keyCol = 1, sep = ";")

Arguments

baseMapType

a character string to indicate which database will be parsed. It can be "sp","trembl","ipi","refseq","equal", "merge","mppi", "PeptideAtlas","DBSubLoc","Pfam", "pfamname", "prositede" or "blast".

db

a boolean to indicate whether the parser file for the SQLite-based package will be returned.

outName

a character string for the output file name of perl program.

dataFile

a character string for the input file name of perl program.

parser

a character string for the path of the parser file.

organism

a character string for the name of the organism of concern. (eg: "Homo sapiens")

perlName

a character string for the name of perl program.

script

a character string for the name of perl program.

os

character string, giving the Operating System (family) of the computer.

srcUrls

character string, giving the url of concerned database.

built

a character string for the release/version information of source data.

fromWeb

a boolean to indicate whether the source data will be downloaded from the web or read from a local file

srcObjs

a object of class "pBase".

dataRow

character vector, each element of which is to be split.

sep

a character string containing regular expression(s) to use as "split".

asNumeric

a boolean to indicate whether the elements will be converted to objects of type "numeric".

entrySep

a character string containing regular expression(s) to use in the first "split".

eleSep

a character string containing regular expression(s) to use in the second "split".

mergeMe

a vector or a matrix which duplicating values for the same id will be merged

keyCol

a integer indicating the column index to be regarded as key.

Details

These functions are from Bioconductor "AnnBuilder" package, but add many new operations depend on the requirements of building proteomic annotation data packages.

getBaseParsers return a character string of the name of a parser file based on the given database. Each parser file is a part of perl script and used to parse relevant data.

fileMuncher produce perl file based on given parser and additional input files, then perform this perl program via R. fileMuncher_DB produce perl file based on given parser and additional input, then perform this perl program via R. Result data are stored in the relative ouput files. It is designed for the SQLite-based annotation package. writeInput write additional information including input files into the perl script. writeOutput write information about ouput files into the perl script. .callPerl perform perl program via R.

getSrcObjs given url of database and concerned organism, define objects of class "pBase". pBase is a sub class of "pubRepo", and it is used for SwissProt, TREMBL, IPI and NCBI RefSeq data. getBaseData get basic protein annotation data and sequence data from protein database: SwissProt, TREMBL, IPI, NCBI PefSeq.

splitEntry split multiple entry for a given mapping. twoStepSplit split multiple entry with two separaters (e.g. 12345@18;67891@18). mergeRowByKey merge duplicating values for the same key.

Value

getBaseParsers returns the path of parser file.

getSrcObjs returns a list of defined the objects of class "pBase".

getBaseData returns a matrix of protein annotation data.

splitEntry returns a vector.

twoStepSplit returns a vector.

mergeRowByKey returns a data frame containing the merged values.

Author(s)

Hong Li

References

Zhang, J., Carey, V., Gentleman, R. (2003) An extensible application for assembling annotation for genomic data.Bioinformatics 19(1), 155-156.


PAnnBuilder documentation built on May 2, 2018, 4:07 a.m.