prot_level_multiMat_PresAbs: Multi-Matrix Presence Absence analysis

View source: R/TwoPart_MultiMS.R

prot_level_multiMat_PresAbsR Documentation

Multi-Matrix Presence Absence analysis

Description

Multi-Matrix Presence Absence Analysis computes Model-Based statistics for each dataset and sums them up to produce the final statistic. The significance is determined via a permutation test which computes the same statistics and sums them after permuting the values across treatment groups, as is outlined in Karpievitch et al. 2018. Whenever possible proteins should be analysed using the Model-Based Differential Expression Analysis due to higher statistical power over the Presence Absence analysis.

Usage

prot_level_multiMat_PresAbs(
  mm_list,
  treat,
  prot.info,
  prot_col_name,
  nperm = 500,
  dataset_suffix
)

Arguments

mm_list

list of matrices of intensities for each experiment, dimentions: numpeptides x numsamples

treat

list of data frames with treatment information to compute the statistic, parallel to mm_list and prot.info

prot.info

list of protein metadata for each matrix in mm_list, data.frame parallel to mm_list and treat

prot_col_name

column names present in all datasets that identifies protein IDs across all datasets

nperm

number of permutations

dataset_suffix

a list of strings that will be appended to the column names for FC, PV, BHPV and numebers of peptides

Value

a data frame with the following columns:

protIDused

protein metadata, peptide sequence if was passed in as one of the columns is the first peptide equence encountered in the data for that protein

FCs

Avegares across all datasets of the approximation of the fold change computed as percent missing observations group 1 munis in percent missing observations group 2 in peptideLevel_PresAbsDE() function

P_val

p-value for the comparison between 2 groups (2 groups only here) obtained from a permutation test

BH_P_val

Benjamini-Hochberg adjusted p-values

statistic

statistic returned by the g-test and summed across all datasets, not very useful as depends on the direction of the test and can produce all 0's

u_prot_info

column containing ptoein identifiers across all datasets

FCs

Approximation of the fold change computed as percent missing observations group 1 munis in percent missing observations group 2 in peptideLevel_PresAbsDE() function

PV

p-values produced by g-test for individual datasets

BHPV

adjusted p-values produced by g-test for individual datasets

NUMPEP

number of peptides observed for each protein in each of the datasets

Examples

# Load mouse dataset
data(mm_peptides)
head(mm_peptides)
intsCols = 8:13
metaCols = 1:7
m_logInts = make_intencities(mm_peptides, intsCols)  # will reuse the name
m_prot.info = make_meta(mm_peptides, metaCols)
m_logInts = convert_log2(m_logInts)
grps = as.factor(c('CG','CG','CG', 'mCG','mCG','mCG'))

set.seed(135)
mm_m_ints_eig1 = eig_norm1(m=m_logInts,treatment=grps,prot.info=m_prot.info)
mm_m_ints_eig1$h.c # check the number of bias trends detected
mm_m_ints_norm = eig_norm2(rv=mm_m_ints_eig1)

# Load human dataset
data(hs_peptides)
head(hs_peptides)
intsCols = 8:13
metaCols = 1:7
m_logInts = make_intencities(hs_peptides, intsCols)
m_prot.info = make_meta(hs_peptides, metaCols)
m_logInts = convert_log2(m_logInts)
grps = as.factor(c('CG','CG','CG', 'mCG','mCG','mCG'))

set.seed(137)
hs_m_ints_eig1 = eig_norm1(m=m_logInts,treatment=grps,prot.info=m_prot.info)
hs_m_ints_eig1$h.c # check the number of bias trends detected
hs_m_ints_norm = eig_norm2(rv=hs_m_ints_eig1)

# Set up for presence/absence analysis
raw_list = list()
norm_imp_prot.info_list = list()
raw_list[[1]] = mm_m_ints_eig1$m
raw_list[[2]] = hs_m_ints_eig1$m
norm_imp_prot.info_list[[1]] = mm_m_ints_eig1$prot.info
norm_imp_prot.info_list[[2]] = hs_m_ints_eig1$prot.info

protnames_norm_list = list()
protnames_norm_list[[1]] = unique(mm_m_ints_norm$normalized$MatchedID)
protnames_norm_list[[2]] = unique(hs_m_ints_norm$normalized$MatchedID)

presAbs_dd = get_presAbs_prots(mm_list=raw_list,
                              prot.info=norm_imp_prot.info_list,
                              protnames_norm=protnames_norm_list,
                              prot_col_name=2)

ints_presAbs = list()
protmeta_presAbs = list()
ints_presAbs[[1]] = presAbs_dd[[1]][[1]] # Mouse
ints_presAbs[[2]] = presAbs_dd[[1]][[2]] # HS
protmeta_presAbs[[1]] = presAbs_dd[[2]][[1]]
protmeta_presAbs[[2]] = presAbs_dd[[2]][[2]]

treats = list()
treats[[1]] = grps
treats[[2]] = grps

subset_presAbs = subset_proteins(mm_list=ints_presAbs,
                        prot.info=protmeta_presAbs, 'MatchedID')

nperm = 50  # set to 500+ for publication
set.seed(275937)
presAbs_comb = prot_level_multiMat_PresAbs(
                           mm_list=subset_presAbs$sub_mm_list,
                           treat=treats,
                           prot.info=subset_presAbs$sub_prot.info,
                           prot_col_name='MatchedID', nperm=nperm,
                           dataset_suffix=c('MM', 'HS') )

plot_volcano(presAbs_comb$FC, presAbs_comb$BH_P_val,
             FC_cutoff=.5, PV_cutoff=.05,
             'Combined Pres/Abs CG vs mCG')


YuliyaLab/ProteoMM documentation built on April 19, 2022, 8:12 a.m.