preprocess: Preprocess data from a MS-GF mzID file.
In compomics/search-all-assess-subset: An implementation of the Search All, Asses Subset strategy for FDR estimation shotgun proteomics.

Description Usage Arguments Value Examples

The parsed data frame from saas::parse_msgf_mzid function contains sometimes multiple entries for a spectrum. (eg. if sequence can be assigned to multiple protein ids). This function takes care of this by default.

1 2	preprocess(dat, remove_target_decoy_PSM = TRUE, remove_multiple_proteins_PSM = FALSE, is_subset = NULL)

`dat`	Data frame generated by the saas::parse_msgf_mzid function.
`remove_target_decoy_PSM`	TRUE to remove PSMs that match both a target and decoy sequence.
`remove_multiple_proteins_PSM`	TRUE to remove PSMs that can be assigned to multiple protein ids.
`is_subset`	Location of fasta file with protein_id of the subset of interest in the fasta headers.

Data frame with the same columns as “dat”. The column protein_id contains all protein_ids that can be assigned to this PSM. Multiple protein_ids are separated by “;”. When “is_subset” is specified, two columns are added:

subset: TRUE if sequence can be assigned to a subset protein id
non_subset: TRUE if sequence can be assigned to a non subset protein id

Every spectrum haves only 1 row in the data frame.

## Location of the zipped data files
zip_file_path = system.file("extdata", "extdata.zip", package = "saas")

## Unzip and get the (temporary) location of the mzid file with the MS-GF+ search results from a
## competitive target decoy search of the complete pyrococcus proteome against a pyrococcus dataset.
mzid_file_path = unzip(zip_file_path, 'pyrococcus.mzid',exdir = tempdir())
## Read and parse the mzid file
data = parse_msgf_mzid(mzid_file_path)

## Unzip and get the (temporary) location of the file with fasta headers.
## Each fasta header contains a protein_id from the protein subset of interest.
## These protein_ids match the protein_ids in the mzid result file.
fasta_file_path = unzip(zip_file_path, 'transferase_activity_[GO:0016740].fasta', exdir = tempdir())

## Preprocess the data before FDR estimation.
data_prep = preprocess(data, is_subset = fasta_file_path)

## Estimate the FDR in the subset.
data_result = calculate_fdr(data_prep, score_higher = FALSE)
## Check how many PSMs are retained at the 1% FDR threshold.
table(data_result$FDR_stable < .01)

compomics/search-all-assess-subset documentation built on May 13, 2019, 9:55 p.m.

compomics/search-all-assess-subset index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

compomics/search-all-assess-subset
An implementation of the Search All, Asses Subset strategy for FDR estimation shotgun proteomics.

preprocess: Preprocess data from a MS-GF mzID file.
In compomics/search-all-assess-subset: An implementation of the Search All, Asses Subset strategy for FDR estimation shotgun proteomics.

Description

Usage

Arguments

Value

Examples

Related to preprocess in compomics/search-all-assess-subset...

R Package Documentation

Browse R Packages

We want your feedback!

compomics/search-all-assess-subset An implementation of the Search All, Asses Subset strategy for FDR estimation shotgun proteomics.

preprocess: Preprocess data from a MS-GF mzID file. In compomics/search-all-assess-subset: An implementation of the Search All, Asses Subset strategy for FDR estimation shotgun proteomics.

Description

Usage

Arguments

Value

Examples

Related to preprocess in compomics/search-all-assess-subset...

R Package Documentation

Browse R Packages

We want your feedback!

compomics/search-all-assess-subset
An implementation of the Search All, Asses Subset strategy for FDR estimation shotgun proteomics.

preprocess: Preprocess data from a MS-GF mzID file.
In compomics/search-all-assess-subset: An implementation of the Search All, Asses Subset strategy for FDR estimation shotgun proteomics.