pgcaDict: Link Protein Groups Created from MS/MS Data

Description Usage Arguments Details Value References See Also Examples

View source: R/pgca.R

Description

Build a dictionary for protein groups from MS/MS data. Details of the algorithm can be found in Takhar et al. (Under revision). "PGCA: An Algorithm to Link Protein Groups Created from MS/MS Data.".

Usage

1
pgcaDict(..., col.mapping, master.gene.identifier)

Arguments

...

arbitrary number of directory names, file names, or data.frames used as input.

col.mapping

column mapping (see Details).

master.gene.identifier

if given, genes with this value in the column group.identifier are considered master genes.

Details

If the group.identifier column is logical (i.e., TRUE or FALSE) or master.gene.identifier is given, the TRUE accessions are assumed to be a "master gene" and the data set is assumed to be in the correct order. This means all FALSE values following the master gene are assumed to belong to the same group.

The col.mapping maps the column names in the data files to a specific function. It nees to be a named character vector, whereas the name of each item is the "function" of the given column name. The algorithm knows about the following columns:

"group.identifier"

Column containing the group identifier.

"accession.nr"

Column containing the accession nr.

"protein.name"

Column containing the protein name.

"gene.symbol"

Column containing the gene symbol (if any, can be missing)

The default column mapping is c(group.identifier="N", accession.nr = "Accession", protein.name="Protein_Name"). The supplied column mapping can ignore those columns that are already correct in the default map. For instance, if the accession nr. is stored in column AccessionNr instead of Accession, but the remaining columns are the same as in the default mapping, specifying col.mapping=c(accession.nr="AccessionNr") is sufficient.

Value

An object of type pgcaDict.

References

Takhar M, Sasaki M, Hollander Z, McManus B, McMaster W, Ng R and Cohen Freue G (Under revision). "PGCA: An Algorithm to Link Protein Groups Created from MS/MS Data." PLOS ONE.

See Also

applyDict to apply the dictionary to the data files and saveDict to save the dictionary itself.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# Build the dictionary from all files in a directory
# and specifying the column "Gene_Symbol" holds the `gene.symbol`.
dict.dir <- pgcaDict(
         system.file("extdata", package="pgca"),
         col.mapping=c(gene.symbol="Gene_Symbol")
)

# Build the dictionary from a list of files
dict.files <- pgcaDict(
     system.file("extdata", "BET1947_v339.txt", package="pgca"),
     system.file("extdata", "BET2007_v339.txt", package="pgca"),
     system.file("extdata", "BET2047_v339.txt", package="pgca"),
     col.mapping=c(gene.symbol="Gene_Symbol")
)

# Build the dictionary from already read-in data.frames
dict.data <- pgcaDict(BET1947_v339, BET2047_v339,
                      col.mapping=c(gene.symbol="Gene_Symbol"))

gcohenfr/pgca documentation built on Aug. 28, 2021, 2:57 p.m.