In czang97/mpred: Predict unmeasured metabolites

An R package predict unmeasured metabolites:
Metabolites are being used in studies of different cancer types. However, it is very hard to measure all the metbolites in patients. Thus, this tool can help you to predict unmeasured metabolites based on the measured ones.

Installation

You can install mpred from GitHub using the following commands:

install.packages("devtools")    
library(devtools)   
devtools::install_github("czang97/mpred")

Extract sample data

Metabolites data across 7 studies are hosted in this mpred package.

RC12 <- RC12   
RC18 <- RC18   
RC12_sampleinfo <- RC12_sampleinfo  
RC18_sampleinfo <- RC18_sampleinfo

Prepare data

If you don't have a hold-out dataset, if you are only working with one dataset use this: prepare_data. And then, you are going to split your dataset into training set and test set.

RC12_tumor <- prepare_data(df = RC12, df_sampleinfo = RC12_sampleinfo, df_sample_name = "SAMPLE_NAME", type = "tumor")

# Tidy dataset
t_RC12_tumor <- df_tidy( RC12_tumor_match, standardize = "z")

# seperate dataset into train set and test set
set.seed(58)
t_df_tumor_train <- sample_frac(t_df_tumor, 0.7) #train
t_df_tumor_test <- setdiff(t_df_tumor, t_df_tumor_train) #test

If you have a hold-out dataset, let's say, you are training one your first dataset and testing on your second dataset. And the datasets are from two different studies, you can use this fucntion: subset_data

list <- subset_data(df1 = RC12, df1_sampleinfo = RC12_sampleinfo, df2 = RC18, df2_sampleinfo = RC18_sampleinfo, df1_sample_name = "SAMPLE_NAME",  df2_sample_name = "SAMPLE_NAME")
RC12_tumor_match <- list[[1]]
RC18_tumor_match <- list[[2]]


# Tidy dataset
t_RC12_tumor <- df_tidy( RC12_tumor_match, standardize = "z")
t_RC18_tumor <- df_tidy( RC18_tumor_match, standardize = "z")

# get metabolites id
m_id_sort <- get_m_id_vector(RC12_tumor_match)

Build LASSO model and evaluate

Run LASSO model and evaluate model fit, output a vector of MSE and a vector of r2

MSE <- c() #initiate a vector for MSE
r2 <- c() #initiate a vector for r2
for (i in (1:length(m_id_sort))){
    set.seed(1)
    MSE_m2[i] <- LASSO_model2_fit(t_df_tumor_train, t_df_tumor_test, m_id_sort, i, eval = "mse")
    r2_m2[i] <- LASSO_model2_fit(t_df_tumor_train, t_df_tumor_test, m_id_sort, i, eval = "r2")
}


# extract coefficient of LASSO model
coef_list = list()
coef_list <- LASSO_coef(df, m_id_sort)

You can also plot predicted versus actual

plot_list <- list() #initialize plot list where to store plots
plot_list <- plot_actual_vs_fit(train, test, m_id_sort) #plot loop

pdf("plot_list.pdf")  #save all plots in pdf
for (i in 1:length(m_id_sort)) {
    print(plot_list[[i]])
}
dev.off()