getMipsInfo: A function that reads the downloaded text file from the MIPS...

Description Usage Arguments Details Value Author(s) References Examples

View source: R/getMipsInfo.R

Description

This function reads the downloaded text file from the MIPS database and parses the file for those collection of proteins either referred to as a "complex", an "-ase" (e.g. RNA Polymerase), or a "-some" (e.g. ribosome) and (or) user supplied terms as the protein complex of interest. It returns a list containing two items: a named list of protein complexes and a character vector (of the same length as the named list) describing each protein complex.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
getMipsInfo(wantDefault = TRUE, toGrep = NULL,
parseType = NULL, eCode = c("901.01.03", "901.01.03.01", "901.01.03.02",
                 "901.01.04", "901.01.04.01", "901.01.04.02",
                 "901.01.05", "901.01.05.01", "901.01.05.02",
                 "902.01.09.02", "902.01.01.02.01.01",
                 "902.01.01.02.01.01.01", "902.01.01.02.01.01.02",
                 "902.01.01.02.01.02", "902.01.01.02.01.02.01",
                 "902.01.01.02.01.02.02", "902.01.01.04",
                 "902.01.01.04.01", "902.01.01.04.01.01",
                 "902.01.01.04.01.02", "902.01.01.04.01.03",
                 "902.01.01.04.02", "901.01.09.02"), wantSubComplexes=TRUE,	
		 ht=FALSE, dubiousGenes=NULL)

Arguments

wantDefault

A logical. If true, the default parameters "complex", "\Base\b" and "\Bsome\b" are grepped.

toGrep

A character vector. Each entry is a term with perl regular expressions which are intended to be searched in the Mips text file.

parseType

A character vector. Each entry is a term that tells how each entry of toGrep should be parsed; e.g. "grep" or "agrep"

eCode

A character vector. The evidence code is given in the file evidence.scheme found in the inst/extdata section of the package.

wantSubComplexes

A logical.If FALSE, the function only returns aggregate protein complexes. If TRUE, the function will also return subcomplexes as well.

ht

A logical. If FALSE, the function will not extract protein complex estimates obtained from high throughput analysis.

dubiousGenes

A character vector of genes that will be removed when parsing the MIPS repository.

Details

This function's generic operation is to parse the Mips protein complex database (as given by the downloaded text file) and search for pre-determined or chosen terms. It returns a named list of chracter vectors where the names are MIPS id's from the protein complex sub-category and the vectors consist of proteins corresponding to that particular MIPS id. Running this function has multiple combinations:

1. If the wantDefault parameter is TRUE, the function will grep for "complex", "\Base\b", and "\Bsome\b".

2. If toGrep is not NULL, it will be a character vector with terms and perl regular expressions that are intended for searching in the MIPS database. NB - it toGrep is not NULL, then parseType should also not be NULL as the parseType indicates how each term should be searched.

3. parseType needs to be supplied if toGrep is not NULL. It is a character vector, either a single entry or of length equal to the length of toGrep, detailing how each term in toGrep will be parsed in the GO database. If only one term is supplied for parseType, then all the terms in toGrep will be parsed identically. Otherwise, the i-th term in parseType will reflect the parsing of the i-th term in toGrep.

4. The eCode argument is a character vector consistin of MIPS evidence codes. A protein will be removed from the protein complex is ALL the evidence codes used to annotate the protein are supplied in the eCode argument; otherwise, it is left in the complex.

5. If wantSubComplexes parameter is True, the function will return the sub-groupings (sub-complexes or sub-structures) as given by the clusterings in the MIPS protein complex database.

6. If ht parameter is True, the function will return the will return those protein complex estimates obtained from high throughput analysis as well.

Value

The return value is a list -

Mips

A named list of the protein complexes. Each list entry is denoted by some particlar MIPS ID (with the pre-fix "MIPS-") attachedand points to a character vector which are the members of that protein complex

DESC

A named chracter vector describing each protein complex parsed by the function. (The names are the MIPS ID)

Author(s)

Tony Chiang

References

mips.gsf.ed

Examples

1
2
3
#mips = getMipsInfo(wantSubComplexes = FALSE)
#mipsPhrase = getMipsInfo(wantDefault = FALSE, toGrep = "\Bsomal\b",
#parseType = "grep", wantSubComplexes=FALSE)

ScISI documentation built on Nov. 8, 2020, 5:48 p.m.