Introduction

The metR package contains tools for comprehensive analysis of data from methylation studies. With the metR one can define regions, get basic statistics about methylation in them, plot methylation rate or rank these regions from the most interesting to the least one.

devtools::install_github("geneticsMiNIng/metR")
library(metR)
library(kableExtra)
library(knitr)
library(metR)
library(kableExtra)
library(knitr)

The data

Data for this example are downloaded from the site: http://www.neuroepigenomics.org/methylomedb/download.html. We used 4 control samples: Control1 AC, Control2 AC, Control3 AC and Control4 AC and 4 disease samples: SCZ1 AC, SCZ2 AC, SCZ3 AC, SCZ4 AC.

In the metR package these samples are available in the schizophrenia data frame. Each row is a single methylation probe. Data from all control/disease samples are combined together.

To speed up calculations and shrink down the size of data we are using only probes from chromosome 1. The R code needed to download data from all samples can be found in the script https://github.com/geneticsMiNIng/metR/blob/master/examples/prep.MethylomeDB.R. The result of this script is a single data.frame schizophrenia that is included in metR package. The last column category indicates if data are from control samples ('control') or from disease samples ('disease').

data('schizophrenia')
kable_styling(kable(head(schizophrenia, 6), "html"), position = "left",
font_size = 9, full_width = F) %>% row_spec(1:6, color = "black")

1. Preprocessing

The data requires some preprocessing. This can be done with the preprocessing() function. In this function we only select data from two given data frames if methylation probe exists in both samples on the same chromosome and position.

control <- schizophrenia %>% filter(category == 'control') %>%
  dplyr::select(-category)

disease <- schizophrenia %>% filter(category == 'disease') %>%
  dplyr::select(-category)

processed_schizophrenia <- preprocessing(control, disease)
kable_styling(kable(head(processed_schizophrenia, 6), "html"), position = "left",
font_size = 9, full_width = F) %>% row_spec(1:6, color = "black")

The preprocessing() function need two probes with specific columns:

kable_styling(kable(head(control, 3), "html"), position = "left",
font_size = 9, full_width = F) %>% row_spec(1:3, color = "black")

So:

kable_styling(kable(head(processed_schizophrenia, 3), "html"), position = "left",
font_size = 9, full_width = F) %>% row_spec(1:3, color = "black") %>%
  column_spec(3, background = "#f21a1a", color = 'white')

By preprocessing we get one data frame where we have both results of methylation in position and chromosome from control and disease samples. Prob columns indicates if data are from control samples (x), or from disease samples (y).

Tiles / regions

All statistics in the metR package are calculated for regions of DNA. Currently the metR package implements two functions that defines regions:

After regions creation they will be tested if there is significant difference between two probes in methylation rate.

tiles_schizophrenia <- create_tiles_max_gap(processed_schizophrenia, gaps.length = 100)

Column tiles indicates id of region in chromosome:

kable_styling(kable(head(tiles_schizophrenia, 3), "html"), position = "left",
font_size = 9, full_width = F) %>% row_spec(1:3, color = "black")  %>%
  column_spec(9, background = "#f21a1a", color = 'white')

It's also possible to use create_tiles_fixed_length function, where tiles.length that specifies maximum difference between minimum and maximum position in the same methylation regions. If common = TRUE function creates second regions group that are min position is (min position + max position)/2 of k-region and max position is (min position + max position) of k+1 region.

tiles_schizophrenia_2 <- create_tiles_fixed_length(processed_schizophrenia, tiles.length = 1000, common = F)
tiles_schizophrenia_3 <- create_tiles_fixed_length(processed_schizophrenia, tiles.length = 1000, common = T)

In output from create_tiles_fixed_length function column tiles also indicates id of region in chromosome and tiles.common indicates second group id region:

kable_styling(kable(head(tiles_schizophrenia_2, 3), "html"), position = "left",
font_size = 9, full_width = F) %>% row_spec(1:3, color = "black") %>%
  column_spec(8, background = "#f21a1a", color = 'white')

kable_styling(kable(head(tiles_schizophrenia_3, 3), "html"), position = "left",
font_size = 9, full_width = F) %>% row_spec(1:3, color = "black")  %>%
  column_spec(8:9, background = "#f21a1a", color = 'white')

Basic statistics

We get basic statistics about two probes by get_stats function. This is helpful if we want check coverage of created regions or methylation difference.

stats <- get_stats(tiles_schizophrenia)
kable_styling(kable(head(stats, 3), "html"), position = "left",
font_size = 9) %>% row_spec(1:3, color = "black") 

We get basic statistics about each region:

We also can join e.g stats and tiles_schizophrenia on chromosome, start and end column and analyzing regions only if they coverage is greater than some specific values or other condition.

Finding DMR

We get interesting regions by using find_DMR function. Data argument is data.frame from create_tiles_fixed_length function or from create_tiles_max_gap function . In methods argument we can type methods which we want to use:

We use 8 methods which are available in metR package:

In following method are compared methylation rate between x and y prob on the same position and chromosome. These methods sorts regions based on p.value from adequate test results:

In Ttest, Wilcoxon and KS test we use two sided alternative hypothesis.

We also implemented regression methods, where number of success are number of methylated citosines and failures are number of unmethylated citosines. Output from this methods is beta coefficient of indicator variable from regression model and criticial value from Wald test on indicator variable. Indicator variable is equal 1 if observations are from x prob and 0 otherwise.

Methods Reg.Log.Beta, Reg.Mixed.Beta, Reg.Corr.Mixed.Beta order regions based on beta coefficients of grouping variable or p.values of grouping variable.

# eval = F
result <- find_DMR(tiles_schizophrenia, methods = c('Wilcoxon', 'Ttest', 'KS', 'Reg.Log', 'Reg.Mixed', 'Reg.Corr.Mixed'))

Results of running function above are included in package. So now we only load data:

data('find.DMR.results')
names(find.DMR.results)

This is a list of data.frames. One data.frame is result of running specific methods. If we want also get results from sorting by beta coefficient we can run:

find.DMR.results <- find_DMR(tiles_schizophrenia, methods = c('Reg.Log', 'Reg.Corr.Mixed', 'Reg.Corr.Mixed')
                    , p.value.log.reg = 0.001, p.value.reg.mixed = 0.001, p.value.reg.corr.mixed = 0.001)

It's better when we use results previously obtained from sorting by p.value of grouping variable because we can spare computing time:

find.DMR.results$Reg.Log.Beta <- find.DMR.results$Reg.Log %>% filter(p.value < 0.001) %>% arrange(-abs(beta.coef))

find.DMR.results$Reg.Mixed.Beta <- find.DMR.results$Reg.Mixed %>% filter(p.value < 0.001) %>% arrange(-abs(beta.coef))

find.DMR.results$Reg.Corr.Mixed.Beta <- find.DMR.results$Reg.Corr.Mixed %>% filter(p.value < 0.001) %>% arrange(-abs(beta.coef))

Now we can select top 100 regions by each method using get_top function:

top <- do.call(gdata::combine,lapply(find.DMR.results, get_top, n = 100, stats = stats))

Plotting methylation

We use draw_methylation function with basic arguments:

i = which.max(top$quantile)
draw_methylation(processed_schizophrenia, top$chr[i], top$start[i], top$end[i], bind.probes = F,
        size.x.dot = 3.5, size.y.dot = 2, plot.title = 13, axis.title.x = 12, axis.title.y = 12, legend.position = 'none',
        axis.text.x = 9, axis.text.y = 9)

Change legend.position to get legend of logarithm reads in each position:

draw_methylation(processed_schizophrenia, top$chr[i], top$start[i], top$end[i], bind.probes = F,
        size.x.dot = 3.5, size.y.dot = 2, plot.title = 13, axis.title.x = 12, axis.title.y = 12, legend.position = 'bottom',
        axis.text.x = 9, axis.text.y = 9, legend.text = 7, legend.title = 8)

Change bind.probes to TRUE argument if you want binding each observations between two probes:

draw_methylation(processed_schizophrenia, top$chr[i], top$start[i], top$end[i], bind.probes = T,
                 smooth.methylation = T,
  size.x.dot = 3.5, size.y.dot = 2, plot.title = 13, axis.title.x = 12, axis.title.y = 12, legend.position = 'none',
        axis.text.x = 9, axis.text.y = 9)

Change smooth.methylation to FALSE if you don't want smoothing observations:

draw_methylation(processed_schizophrenia, top$chr[i], top$start[i], top$end[i], bind.probes = T,smooth.methylation = F,
        size.x.dot = 3.5, size.y.dot = 2, plot.title = 13, axis.title.x = 12, axis.title.y = 12, legend.position = 'none',
        axis.text.x = 9, axis.text.y = 9)

Other arguments are responsible for size of element on plots.

Session info

devtools::session_info()


geneticsMiNIng/metR documentation built on May 28, 2019, 8:41 p.m.