comet: Visualize EWAS results in a genomic region of interest

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/comet.R

Description

coMET is an R-based package to visualize EWAS (epigenome-wide association scans) results in a genomic region of interest. The main feature of coMET is to plot the the significance level of EWAS results in the selected region, along with correlation in DNA methylation values between CpG sites in the region. The coMET package generates plots of phenotype-association, co-methylation patterns, and a series of annotation tracks.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
comet(mydata.file = NULL, mydata.format = "site", mydata.type = "file",
    mydata.large.file = NULL, mydata.large.format = "site",
    mydata.large.type = "listfile", cormatrix.file = NULL,
    cormatrix.method = "spearman", cormatrix.format = "raw",
    cormatrix.color.scheme = "bluewhitered",cormatrix.conf.level=0.05,
    cormatrix.sig.level= 1, cormatrix.adjust="none",
    cormatrix.type = "listfile", mydata.ref = NULL,
    start = NULL, end = NULL, zoom = FALSE, lab.Y = "log",
    pval.threshold = 1e-05,pval.threshold.2 = 0,disp.pval.threshold = 1,
    disp.association = FALSE, disp.association.large = FALSE,
    disp.region = FALSE, disp.region.large = FALSE,
    disp.beta.association = FALSE, disp.beta.association.large = FALSE, factor.beta = 0.3,
    symbols = "circle-fill",
    symbols.large = NA, sample.labels = NULL, sample.labels.large = NULL,
    use.colors = TRUE , disp.color.ref = TRUE, color.list = NULL, color.list.large = NULL,
    disp.mydata = TRUE, biofeat.user.file = NULL, biofeat.user.type = NULL,
    biofeat.user.type.plot = NULL,
    genome = "hg19", dataset.gene = "hsapiens_gene_ensembl",
    tracks.gviz = NULL,
    disp.mydata.names = TRUE, disp.color.bar = TRUE, disp.phys.dist = TRUE,
    disp.legend = TRUE, disp.marker.lines = TRUE, disp.cormatrixmap = TRUE,
    disp.pvalueplot =TRUE, disp.type = "symbol", disp.mult.lab.X = FALSE,
    disp.connecting.lines = TRUE, palette.file = NULL, image.title = NULL,
    image.name = "coMET", image.type = NULL, image.size = 3.5,
    fontsize.gviz=5, font.factor = 1,
    symbol.factor = NULL, print.image = TRUE, connecting.lines.factor = 1.5,
    connecting.lines.adj = 0.01, connecting.lines.vert.adj = -1,
    connecting.lines.flex = 0, config.file = NULL, verbose = FALSE)

Arguments

mydata.file

Name of the info file describing the coMET parameters

mydata.format

Format of the input data in mydata.file. There are 4 different options: site, region, site_asso, region_asso.

mydata.type

Format of mydata.file. There are 2 different options: FILE or MATRIX.

mydata.large.file

Name of additional info files describing the coMET parameters. File names should be comma-separated. It is optional, but if you add some, they need to be file(s) in tabular format with a header. Additional info file can be a list of CpG sites with/without Beta value (DNA methylation level) or direction sign. If it is a site file then it is mandatory to have the 4 columns as shown below with headers in the same order. Beta can be the 5th column(optional) and it can be either a numeric value (positive or negative values) or only direction sign ("+", "-"). The number of columns and their types are defined but the option mydata.large.format.

mydata.large.format

Format of additional data to be visualised in the p-value plot. Format should be comma-separated. There are 4 different options for each file: site, region, site_asso, region_asso.

mydata.large.type

Format of mydata.large.file. There are 2 different options: listfile or listdataframe.

cormatrix.file

Name of the raw data file or the pre-computed correlation matrix file. It is mandatory and has to be a file in tabular format with an header.

cormatrix.method

Options for calculating the correlation matrix: spearman, pearson and kendall

cormatrix.format

Format of the input cormatrix.file. TThere are two options: raw file (raw if CpG sites are by column and samples by row or raw_rev if CpG site are by row and samples by column) and pre-computed correlation matrix (cormatrix)

cormatrix.color.scheme

Color scheme options: heat, bluewhitered, cm, topo, gray, bluetored

cormatrix.conf.level

Alpha level for the confidence interval. Default value= 0.05. CI will be the alpha/2 lower and upper values.

cormatrix.sig.level

Significant level to visualise the correlation. If the correlation has a pvalue under the significant level, the correlation will be colored in "goshwhite", else the color is related to the correlation level and the color scheme choosen.Default value =1.

cormatrix.adjust

indicates which adjustment for multiple tests should be used. "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none".Default value="none"

cormatrix.type

Format of cormatrix.file. There are 2 different options: listfile or listdataframe.

mydata.ref

The name of the referenceomic feature (e.g. CpG-site) listed in mydata.file

start

The first nucleotide position to be visualised. It could be bigger or smaller than the first position of our list of omic features.

end

the last nucleotide position to be visualised. It has to be bigger than the value in the option start, but it could be smaller or bigger than the last position of our list of omic features.

zoom

Default=False

lab.Y

Scale of the y-axis. Options: log or ln

pval.threshold

Significance threshold to be displayed as a red dashed line

pval.threshold.2

the second significance threshold to be displayed as a orange dashed line

disp.pval.threshold

Display only the findings that pass the value put in disp.pval.threshold

disp.association

This logical option works only if mydata.file contains the effect direction (mydata.format=site_asso or region_asso). The value can be TRUE or FALSE: if FALSE (default), for each point of data in the p-value plot, the color of symbol is the color of co-methylation pattern between the point and the reference site; if TRUE, the effect direction is shown. If the association is positive, the color is the one defined with the option color.list. On the other hand, if the association is negative, the color is the opposed color.

disp.association.large

This logical option works only if mydata.large.file contains the effect direction (mydata.large.format=site_asso or region_asso). The value can be TRUE or FALSE: if FALSE (default), for each point of data in the p-value plot, the color of symbol is the color of co-methylation pattern between the point and the reference site; if TRUE, the effect direction is shown. If the association is positive, the color is the one defined with the option color.list.large. On the other hand, if the association is negative, the color is the opposed color.

disp.region

This logical option works only if mydata.file contains regions (mydata.format=region or region_asso). The value can be TRUE or FALSE (default). If TRUE, the genomic element will be shown by a continuous line with the color of the element, in addition to the symbol at the center of the region. If FALSE, only the symbol is shown.

disp.region.large

This logical option works only if mydata.large.file contains regions (mydata.large.format=region or region_asso). The value can be TRUE or FALSE (default). If TRUE, the genomic element will be shown by a continuous line with the color of the element, in addition to the symbol at the center of the region. If FALSE, only the symbol is shown.

disp.beta.association

This logical option works only if mydata.file contains the effect direction (mydata.format=site_asso or region_asso). The value can be TRUE or FALSE: if FALSE (default), for each point of data in the p-value plot, the size of symbol is the default size of symbole; if TRUE, the effect direction is shown.

disp.beta.association.large

This logical option works only if mydata.large.file contains the effect direction (mydata.large.format=site_asso or region_asso). The value can be TRUE or FALSE: if FALSE (default), for each point of data in the p-value plot, the size of symbol is ththe default size of symbole; if TRUE, the effect direction is shown.

factor.beta

Factor to visualise the size of beta. Default value = 0.3.

symbols

The symbol shown in the p-value plot. Options: circle, square, diamond, triangle. symbols can be filled by appending -fill, e.g. square-fill. Example: circle,diamond-fill,triangle

symbols.large

The symbol to visualise the data defined in mydata.large.file. Options: circle, square, diamond, triangle; symbols can either be filled or not filled by appending -fill e.s., square-fill. Example: circle,diamond-fill,triangle

sample.labels

Labels for the sample described in mydata.file to include in the legend

sample.labels.large

Labels for the sample described in mydata.large.file to include in the legend

use.colors

Use the colors defined or use the grey color scheme

disp.color.ref

Logical option TRUE or FALSE (TRUE default). if TRUE, the connection line related to the reference probe is in purple, if FALSE if the connection line related to the reference probe stay black.

color.list

List of colors for displaying the P-value symbols related to the data in mydata.file

color.list.large

List of colors for displaying the P-value symbols related to the data in mydata.large.file

disp.mydata

logical option TRUE or FALSE. TRUE (default). If TRUE, the P-value plot is shown; if FALSE the plot will be defined by GViz

biofeat.user.file

Name of data file to visualise in the tracks. File names should be comma-separated.

biofeat.user.type

Track type, where multiple tracks can be shown (comma-separated): DataTrack, AnnotationTrack, GeneregionTrack.

biofeat.user.type.plot

Format of the plot if the data are shown with the Gviz's function called DataTrack (comma-separated)

genome

The human genome reference file. e.g. "hg19" for Human genome 19 (NCBI 37), "grch37" (GRCh37),"grch38" (GRCh38)

dataset.gene

The gene names from ENSEMBL. e.g. hsapiens_gene

tracks.gviz

list of tracks created by Gviz.

disp.mydata.names

logical option TRUE or FALSE. If True (default), the names of the CpG sites are displayed.

disp.color.bar

Color legend for the correlation matrix (range -1 to 1). Default: blue-white-red

disp.phys.dist

logical option (TRUE or FALSE). TRUE (default).Display the bp distance on the plots

disp.legend

logical option TRUE or FALSE. TRUE (default) Display the sample labels and corresponding symbols on the lower right side

disp.marker.lines

logical option TRUE or FALSE. TRUE (default), if FALSE the red line for pval.threshold is not shown

disp.cormatrixmap

logical option TRUE or FALSE. TRUE (default), if FALSE correlation matrix is not shown

disp.pvalueplot

logical option (TRUE or FALSE). TRUE (default), if FALSE the pvalue plot is not shown

disp.type

Default: symbol

disp.mult.lab.X

logical option TRUE or FALSE. FALSE (default).Display evenly spaced X-axis labels; up to 5 labels are shown.

disp.connecting.lines

logical option TRUE or FALSE. TRUE (default) displays connecting lines between p-value plot and correlation matrix

palette.file

File that contains color scheme for the heatmap. Colors are hexidecimal HTML color codes; one color per line; if you do not want to use this option, use the color defined by the option cormatrix.color.scheme

image.title

Title of the plot

image.name

The path and the name of the plot file without extension. The extension will be added by coMET depending on the option image.type.

image.type

Options: pdf or eps

image.size

Default: 3.5 inches. Possible sizes : 3.5 or 7

fontsize.gviz

Font size of writing in annotation track. Default value =5

font.factor

Font size of the sample labels. Range: 0-1

symbol.factor

Size of the symbols. Range: 0-1

print.image

Print image in file or not.

connecting.lines.factor

Length of the connecting lines. Range: 0-2

connecting.lines.adj

Position of the connecting lines horizontally. Negative values shift the connecting lines to the left and positive values shift the lines to the right. Range: (-1;1) option -1 means no connecting lines.

connecting.lines.vert.adj

Position of the connecting lines vertically. Can be used to vertically adjust the position of the connecting lines in relation to the CpG-site names. Negative value shift the connecting lines down. Range: (-0.5 - 0), option -1 mean the default value related to the plot size (-0.5 for 3.5 plot size; -0.7 for 7.5 plot size)

connecting.lines.flex

Adjusts the spread of the connecting lines. Range: 0-2

config.file

Configuration file contains the values of these options instead of defining these by command line. It is a file where each line is one option. The name of option and its value are separated by "=". If there are multiple values such as for the option list.tracks or the options for additional data, you need to separated them by a "comma" and not extra space. (i.e. list.tracks=geneENSEMBL,CGI,ChromHMM,DNAse,RegENSEMBL,SNP)

verbose

logical option TRUE or FALSE. TRUE (default). If TRUE, shows comments.

Details

The function is limited to visualize 120 omic features.

Value

Create a plot in pdf or eps format depending to some options

Author(s)

Tiphaine Martin

References

http://epigen.kcl.ac.uk/comet/

See Also

comet.web,comet.list

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
extdata <- system.file("extdata", package="coMET",mustWork=TRUE)
configfile <- file.path(extdata, "config_cyp1b1_zoom_4comet.txt")
myinfofile <- file.path(extdata, "cyp1b1_infofile.txt")
myexpressfile <- file.path(extdata, "cyp1b1_infofile_exprGene_region.txt")
mycorrelation <- file.path(extdata, "cyp1b1_res37_rawMatrix.txt")

chrom <- "chr2"
start <- 38290160
end <- 38303219
gen <- "hg38"

if(interactive()){
    cat("interactive")
    genetrack <-genes_ENSEMBL(gen,chrom,start,end,showId=TRUE)
    snptrack <- snpBiomart_ENSEMBL(gen, chrom, start, end,
                dataset="hsapiens_snp_som",showId=FALSE)
    strutrack <- structureBiomart_ENSEMBL(gen, chrom, start, end,
                strand, dataset="hsapiens_structvar_som")
    clinVariant<-ClinVarMain_UCSC(gen,chrom,start,end)
    clinCNV<-ClinVarCnv_UCSC(gen,chrom,start,end)
    gwastrack <-GWAScatalog_UCSC(gen,chrom,start,end)
    geneRtrack <-GeneReviews_UCSC(gen,chrom,start,end)
    listgviz <- list(genetrack,snptrack,strutrack,clinVariant,
                 clinCNV,gwastrack,geneRtrack)
    comet(config.file=configfile, mydata.file=myinfofile, mydata.type="file",
      cormatrix.file=mycorrelation, cormatrix.type="listfile",
      mydata.large.file=myexpressfile, mydata.large.type="listfile",
      tracks.gviz=listgviz, verbose=FALSE, print.image=FALSE,disp.pvalueplot=FALSE)
} else {
    cat("Non interactive")
    data(geneENSEMBLtrack)
    data(snpBiomarttrack)
    data(ISCAtrack)
    data(strucBiomarttrack)
    data(ClinVarCnvTrack)
    data(clinVarMaintrack)
    data(GWASTrack)
    data(GeneReviewTrack)
    listgviz <- list(genetrack,snptrack,strutrack,clinVariant,
                clinCNV,gwastrack,geneRtrack)
    comet(config.file=configfile, mydata.file=myinfofile, mydata.type="file",
       cormatrix.file=mycorrelation, cormatrix.type="listfile",
        mydata.large.file=myexpressfile,  mydata.large.type="listfile",
        tracks.gviz=listgviz, verbose=FALSE, print.image=FALSE,disp.pvalueplot=FALSE)
}

coMET documentation built on Nov. 8, 2020, 5 p.m.