prepare_PVCA_df: prepare the weights of Principal Variance Components

View source: R/proteome_wide_diagnostics.R

prepare_PVCA_dfR Documentation

prepare the weights of Principal Variance Components

Description

prepare the weights of Principal Variance Components

Usage

prepare_PVCA_df(data_matrix, sample_annotation,
  feature_id_col = "peptide_group_label",
  sample_id_col = "FullRunName", technical_factors = c("MS_batch",
  "instrument"), biological_factors = c("cell_line", "drug_dose"),
  fill_the_missing = -1, pca_threshold = 0.6,
  variance_threshold = 0.01)

Arguments

data_matrix

features (in rows) vs samples (in columns) matrix, with feature IDs in rownames and file/sample names as colnames. See "example_proteome_matrix" for more details (to call the description, use help("example_proteome_matrix"))

sample_annotation

data frame with:

  1. sample_id_col (this can be repeated as row names)

  2. biological covariates

  3. technical covariates (batches etc)

. See help("example_sample_annotation")

feature_id_col

name of the column with feature/gene/peptide/protein ID used in the long format representation df_long. In the wide formatted representation data_matrix this corresponds to the row names.

sample_id_col

name of the column in sample_annotation table, where the filenames (colnames of the data_matrix are found).

technical_factors

vector sample_annotation column names that are technical covariates

biological_factors

vector sample_annotation column names, that are biologically meaningful covariates

fill_the_missing

numeric value determining how missing values should be substituted. If NULL, features with missing values are excluded. If NULL, features with missing values are excluded.

pca_threshold

the percentile value of the minimum amount of the variabilities that the selected principal components need to explain

variance_threshold

the percentile value of weight each of the covariates needs to explain (the rest will be lumped together)

Value

data frame with weights and factors, combined in a way ready for plotting

Examples

matrix_test <- example_proteome_matrix[1:150, ]
pvca_df_res <- prepare_PVCA_df(matrix_test, example_sample_annotation, 
technical_factors = c('MS_batch', 'digestion_batch'),
biological_factors = c("Diet", "Sex", "Strain"), 
pca_threshold = .6, variance_threshold = .01, fill_the_missing = -1)

symbioticMe/proBatch documentation built on April 9, 2023, 11:59 a.m.