df2protdata: Data frame to protein data

View source: R/df2protdata.R

df2protdataR Documentation

Data frame to protein data

Description

Converts data frame df to a a protdata object, which can be used for further analysis.

Usage

df2protdata(df, acc_col, quant_cols, quant_name = "quant_value",
  run_name = NULL, annotations = NULL)

Arguments

df

A data frame that needs to be converted to a protdata object.

acc_col

A character string or numeric index indicating the column in data frame df that contains the identifiers by which the data should be grouped in the resulting protdata object.

quant_cols

A vector of character strings or numeric indices indicating which column(s) in the data frame df contain the quantitative values of interest (mostly peptide intensities or peptide areas under the curve). If quant_cols contains only one element, df2protdata will assume that data frame df is in "long" format. If quant_cols contains more than one element, df2 protdata will assume the data to be in "wide" format.

quant_name

A character string indicating the name that will be given to the column that will contain the quantitative values of interest (mostly peptide intensities or peptide areas under the curve). Defaults to "quant_value".

run_name

If quant_cols contains more than one element (i.e. the data is in "wide" format), this should contain a freely chosen character string indicating the name that will be given to the column containing the mass spec run names. If no name is chosen (run_name=NULL), this will default to "run". If quant_cols contains only one element (i.e. the data is in "long" format), run_name should contain the name of the column that contains the run names.

annotations

A vector of character strings or numeric indices indicating the columns in the data frame df that contain additional information on the accessions (typically protein names, gene names, gene ontologies,...) that should be added in a separate annotation slot. In case multiple values of the same annotation column would exist for a unique accession, these values are pasted together. Defaults to NULL, in which case no annotations will be added.

Value

A protdata object.

Examples

#This example will convert df object peptides into a protdata object proteins.
#Import the data as a df object
pepdf <- read.table(system.file("extdata/CPTAC", "peptides.txt", package = "MSqRob"), sep="\t", header=TRUE)
#To save time, we only take the first 50 peptides as an example
pepdf <- pepdf[1:50,]
#Determine columns that contain the intensity values
quant_cols <- colnames(pepdf)[which(grepl("Intensity.",colnames(pepdf)))]
#Log2 transform data and change -Inf to NA
pepdf[,quant_cols] <- log2(pepdf[,quant_cols])
tmp_ints <- pepdf[,quant_cols]
tmp_ints <- as.matrix(tmp_ints)
tmp_ints[is.infinite(tmp_ints)] <- NA
tmp_ints <- as.data.frame(tmp_ints)
pepdf[,quant_cols] <- tmp_ints
#Keep only columns of interest
pepdf <- pepdf[,c(quant_cols,"Proteins","Sequence","PEP")]
#Dermine the column that contains the protein names
acc_col <- "Proteins"
proteins <- df2protdata(pepdf, acc_col, quant_cols)

statOmics/MSqRob documentation built on Dec. 8, 2022, 6 a.m.