README.md
In ChongWu-Biostat/MethyImpute: Imputing the missing covariates via methylation data

MethyImpute

Impute the missing covariates via Methylation Data

We test it on R 3.2.1 in Linux server and 3.2.2 in Windows and Mac. Note that for windows user, we need install Rtools first. For Mac user, we need install gfortran.

library(devtools)
install_github("ChongWu-Biostat/MethyImpute") # install the MethyImpute packages

Note that we simulate four cell type composite and one continuous covariate we are interested in. The methylation data come from three models. The first 50 CpG sites are correlated with the cell type composite and the covariate we are interested in. The 51-250 CpG sites are only correlated with the cell type composite, which is used to validate the type 1 error rate. The 251-400 CpG sites are only correlated with the covariate we are interested in. The 401-1000 CpG sites are drawn from the null distribution.

library(MethyImpute)
## Set the initial conditions
N <- 500 ## Total sample size (subjects)
m <- 1000  ## Total CpG sites
missing.prop = 0.3 ## missing rate
b4.value = 0.02    ## The effect size
seed = 1
data = simulatingData(seed,N,m,missing.prop, b4.value)

Once we have the data, we need pre-process the data to the certain form and then we can apply multiple imputation (MI) based method to impute the missing covariate.

Y = data$reference.Y
pheno = data$pheno
pheno = pheno[,2:5]
reference.index = data$reference.num
missing.index = data$test.num
missing.cov.name = c("cell.1","cell.2","cell.3")
complete.cov.name = "x1"
m=30
maxit = 5
max.refernce.methy = 30
defaultMethod = c("norm", "logreg", "polyreg", "polr")
## Note that this function is kind of time consuming. But it is feasiable to apply it to ARIC data or other real data set. 
imp = methy.mice(Y,pheno,missing.index, reference.index, missing.cov.name,complete.cov.name,max.refernce.methy = 30, m=30,maxit = 5,defaultMethod = c("norm", "logreg", "polyreg", "polr"))

Then we can apply the standard analysis. For example, we can use linear regression to test the relation between the methylation value and the covariate we are interested in as follows.

imputed.b4 = matrix(NA,5,5)
  for ( i in 1:5) {
    fit <- with(imp, lm(as.formula(paste(paste("methy",i,sep=""),"~ cell.1 + cell.2 + cell.3 +x1",sep =""))))
      fit.summary = summary(pool(fit))
      imputed.b4[i,1] = fit.summary[5,1] - 1.96 * fit.summary[5,2]
      imputed.b4[i,2] = fit.summary[5,1]
      imputed.b4[i,3] = fit.summary[5,1] + 1.96 * fit.summary[5,2]
      imputed.b4[i,4] = fit.summary[5,5] # p-vlaue
      imputed.b4[i,5] = fit.summary[5,2]
  }

  colnames(imputed.b4) = c("Lower","Estimated","Uppper","pvalue","SE")

We tailor the method proposed by Houseman et al. to a more general situation. Briefly, the Houseman algorithm identifies 100-300 CpG sites that discriminated cellular composition in sorted normal human cell populations (consisting of B cells, granulocytes, monocytes, NK, and CD4+ and CD8+ T cells). The method fits a linear model at each of these CpG sites using a reference dataset to estimate the coefficient for each cellular component. It then uses a matrix projection approach to map these estimated coefficients to the relative proportions of each cellular component in the samples without cell type composition information. This method can be applied to other phenotypes besides cell type composition, if a reference dataset is available with both DNA methylation data and phenotype of interest. In the case of missing data, we use the set of samples with complete information as the reference dataset, and apply the projection approach to impute the missing values.

Y = data$reference.Y
pheno = data$pheno

#################
reference.N = floor(N*(1-missing.prop))
reference.num = c(1:reference.N)

reference.Y = Y[,reference.num]
reference.pheno =pheno[reference.num,]

test.num = c((reference.N+1):N)
true.pheno = pheno[test.num,]
pheno[test.num,c(1:4)] =NA

our.testData_Assay =Y[,test.num]
our.testData_Pheno =pheno[test.num,]
our.validationData_Assay =  Y[,reference.num]
our.validationData_Pheno = pheno[reference.num,]


# reference.CpG contains the reference methylation data. We can run the single LMM to get the most significant CpG sites for each missing cell type proportion. Here we assume that we already have it.
reference.CpG =reference.Y  # Validation methylation (CpGs x subjects)

# The missing value name you want to impute
missing.name = c("cell.0","cell.1","cell.2","cell.3")


# number of CpG sites you want to use to impute the missing data
missing.n = 200

# The methylation data corresponding to the subjects that cell type proportion is missing
infer.CpG = our.testData_Assay # Inference methylation (CpGs x subjects)

##########################################################
## The main function to impute the missing data       ####
##########################################################
projection.based(reference.CpG,reference.pheno,infer.CpG,missing.name,missing.n = 200)
## NOTE: This function can directly apply to the other missing covariates, such as white blood cell count.

If you like our pakcage MethyImpute, please give us a star. You can download the MethyImpute package manual here.

ChongWu-Biostat/MethyImpute documentation built on May 6, 2019, 11:18 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ChongWu-Biostat/MethyImpute
Imputing the missing covariates via methylation data

README.md
In ChongWu-Biostat/MethyImpute: Imputing the missing covariates via methylation data

MethyImpute

Install the Package

Simulated the Methylation and Covariates

MI Based Method

Projection Based Method

Manual

R Package Documentation

Browse R Packages

We want your feedback!

ChongWu-Biostat/MethyImpute Imputing the missing covariates via methylation data

README.md In ChongWu-Biostat/MethyImpute: Imputing the missing covariates via methylation data

MethyImpute

Install the Package

Simulated the Methylation and Covariates

MI Based Method

Projection Based Method

Manual

R Package Documentation

Browse R Packages

We want your feedback!

ChongWu-Biostat/MethyImpute
Imputing the missing covariates via methylation data

README.md
In ChongWu-Biostat/MethyImpute: Imputing the missing covariates via methylation data