View source: R/modelProteins.R
pre_process | R Documentation |
This function pre-processes protein intensity data from
the top differentially expressed proteins identified with find_dep
for
modeling.
pre_process(
fit_df,
norm_df,
sig = "adjP",
sig_cutoff = 0.05,
fc = 1,
n_top = 20,
find_highcorr = TRUE,
corr_cutoff = 0.9,
save_corrmatrix = FALSE,
file_path = NULL,
rem_highcorr = TRUE
)
fit_df |
A |
norm_df |
The |
sig |
Criteria to denote significance in differential expression.
Choices are |
sig_cutoff |
Cutoff value for p-values and adjusted p-values in
differential expression. Default is |
fc |
Minimum absolute log-fold change to use as threshold for
differential expression. Default is |
n_top |
The number of top hits from |
find_highcorr |
Logical. If |
corr_cutoff |
A numeric value specifying the correlation cutoff.
Default is |
save_corrmatrix |
Logical. If |
file_path |
A string containing the directory path to save the file. |
rem_highcorr |
Logical. If |
This function creates a data frame that contains protein intensities for a user-specified number of top differentially expressed proteins.
Using find_highcorr = TRUE
, highly correlated
proteins can be identified, and can be removed with
rem_highcorr = TRUE
.
Note: Most models will benefit from reducing correlation between proteins (predictors or features), therefore we recommend removing those proteins at this stage to reduce pairwise-correlation.
If no or few proteins meet the significance threshold for differential
expression, you may adjust sig
, fc
, and/or sig_cutoff
accordingly to obtain more proteins for modeling.
A model_df
object, which is a data frame of protein
intensities with proteins indicated by columns.
Chathurani Ranathunge
find_dep
, normalize_data
caret: findCorrelation
## Create a model_df object with default settings.
covid_model_df1 <- pre_process(fit_df = covid_fit_df, norm_df = covid_norm_df)
## Change the correlation cutoff.
covid_model_df2 <- pre_process(covid_fit_df, covid_norm_df, corr_cutoff = 0.95)
## Change the significance criteria to include more proteins
covid_model_df3 <- pre_process(covid_fit_df, covid_norm_df, sig = "P")
## Change the number of top differentially expressed proteins to include
covid_model_df4 <- pre_process(covid_fit_df, covid_norm_df, sig = "P", n_top = 24)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.