knitr::opts_chunk$set(echo = TRUE)

Data

To simulate the T matrix of cell type-specific methylation profiles, we use different studies from GEO. For the cancerous epithelial line, we use the cell line GSM1560930 and for the cancerous mesenchymal line, we use the cell line GSM1560925 from the same study. For the fibroblast line, we use the cell line GSM1354676 after conversion from m-value to b-value. For the T cell line, we use the GSM1641099 and for the control epithelial cell, we use the GSM2743808.

plate_effect contains 22 median plate effects (TCGA experimental batch effect) for 1000 random probes. For each probe, we model plate effects using multiplicative coefficients that measure the ratio of mean methylation values of a plate on mean methylation of the (arbitrarily) 1st plate. Each coefficient is estimated by the median of the 1000 ratios of methylation values. These multiplicative coefficients are then used on all probes to model batch effects on the matrix D of individual convoluted methylation profiles.

sites_sexe indicates the 1397 probes that are correlated with sex in the TCGA dataset. sites_sexe contains the linear regression coefficients used to shift methylation value of female-associated T matrices.

T_brute = readRDS(url("https://zenodo.org/record/3247635/files/T_brut_2.rds"))
head(T_brute)

plate_effect = readRDS(url("https://zenodo.org/record/3247635/files/plate_effect.rds"))
head(plate_effect)

sex_reg = readRDS(url("https://zenodo.org/record/3247635/files/regression_sexe.rds"))
coeff_luad = sex_reg$luad
coeff_lusc = sex_reg$lusc
sites_sexe = colnames(coeff_luad)[coeff_luad[8,] < (0.01) | coeff_lusc[8,] < (0.01)]
coeff_sexe = apply(cbind(coeff_luad[2, ], coeff_lusc[2, ]), 1, mean)
head(sites_sexe)
head(coeff_sexe)

Simulation of A matrix

The function compute_A simulates the A matrix of cell types proportion using a Dirichlet distribution. n is the number of patients, prop define the different cell type proportions (here it is 10% for fibroblast, 60% for cancerous epithelial, 5% for T cells, 15% for control epithelial and 10% for cancerous mesenchyme cells). alpha is equal to the sum of the parameter of the Dirichlet distribution. It is inversely related to variation of proportions in the population. In the simulations, we choose alpha=1 that corresponds to a population with variable proportions.

A <- medepir::compute_A(n = 100, prop = c(0.1, 0.6, 0.05, 0.15, 0.1), alpha = 1)

# plot histogram of proportion for each cell type
rownames(A) <- c("fibroblast", "cancerous epithelial", "T-cells", "epithelial", "cancerous mesenchyme")
dat <- as.data.frame.table(t(A))
colnames(dat) <- c("indiv", "cell_type", "proportion")
ggplot2::ggplot(dat, ggplot2::aes(x = proportion)) + ggplot2::geom_histogram(colour = "black", fill = "white", binwidth = 0.025) +
  ggplot2::facet_grid(cell_type ~ .)

Simulation of D matrix

The function create_exp_grp makes a random experimental dataset. It samples randomly the plate, the sex and/or the age of each patient.

The function compute_D allows to obtain the D matrix according to the D = TA model with all the confounding factors. The sex effect is added directly to the T matrix, while the plate effect is added after matrix multiplication.

Finally, the function add_noise add a random gaussian noise to the D matrix.

exp_grp = medepir::create_exp_grp(n = 100, plates = plate_effect[, 1], 
                                  sex = TRUE, age = FALSE)

D = medepir::compute_D(A_mat = A, T_mat = T_brute, exp_grp, sites_sex = sites_sexe, 
              coeff_sex = coeff_sexe, plate_effect = plate_effect)

D_noise = medepir::add_noise(data = D, mean = 0)

Deconvolution of the D matrix

We use the package RefFreeEwas to make the deconvolution.

results_RFE = medepir::RFE(D_noise, nbcell = 5)

Analysis of the results

The function compare_A computes the Mean Absolute Error between the A matrix used for the simulation and the A matrix computed by the method of deconvolution.

medepir::compare_A(A_r = A, A_est = results_RFE$A)


bcm-uga/medepir documentation built on Oct. 11, 2020, 2 a.m.