preprocess_pqpq_input: Preprocess data for entry into PQPQ software

Description Usage Arguments Details Value Examples

View source: R/preprocess_pqpq_data.R

Description

This function is used to prepare Protein Pilot data for analysis in PQPQ.

Usage

1
2
3
preprocess_pqpq_input(df, data_type = "Protein Pilot",
  protein_subset = NULL, separate_multiple_protein_IDs = FALSE,
  sample_names = NULL, manually_annotated_fields = NULL)

Arguments

df

A data frame (see details)

data_type

one of 'Protein Pilot', 'Spectrum Mill', or 'Proteome Discoverer', depending on the file origin.

protein_subset

identifiers for the proteins to be analyzed.

separate_multiple_protein_IDs

If true, peptides assigned to multiple proteins are duplicated for each protein to which they are assigned. Otherwise, they are analyzed as a group.

sample_names

A character vector identifying the columns holding sample data.

manually_annotated_fields

A list containing annotation data.

Details

Note: this is only tested on the test output from Protein Pilot provided by Jenny Forshed in the PQPQ package, and entered into R using the link{openxlsx} package. The formatting of data from other

Value

A list, containing the cleaned data.frame and the list of labeled variable names.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# the column identifiers column - minimal example
data("testdata2")

# minmal column_ids example
column_ids <- list(
   protein_id = "Accessions",
   confidence = "Conf",
   samples_names = grep("Area",names(testdata2),value = TRUE),
   peptide_ids = c("Sequence", "peptide_id")
)

melissakey/PQPQ documentation built on May 4, 2019, 7:42 p.m.