Description Usage Arguments Details Value Author(s) Examples
The package is designed to predict genes whose expressions are regulaed by m6A methylation based on a negative binomial regression model with pooling prior information across genes to explicitly estimate the sample-to-sample variability between biological replicates and improve the precision and robustness of regression coefficients, which is suitable for MeRIP-seq data in small sample scenario.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | m6Aexpress(# input files in BAM format
express_data,
treated_express_data=character(0),
IP_BAM,
INPUT_BAM,
TREATED_IP_BAM=character(0),
TREATED_INPUT_BAM=character(0),
# annotation file in GTF format or TXDB format
annot_type="hg19",
species="human",
GENE_ANNO_GTF=NULL,
isGTFAnnotationFile=FALSE,
TXDB=NA,
# specific parameters
GENOME = NA,
UCSC_TABLE_NAME = "knownGene",
isPairedEnd = FALSE,
mode="DE-DM",
CV_values = 0.3,
num_sample_subgroup,
# threshod cutoff
pvalue=NA,
FDR=0.05,
diff_gene_pvalue=NA,
diff_gene_fdr=0.05,
diff_peak_pvalue=NA,
diff_peak_fdr=0.05,
# output direction
OUTPUT_DIR= NA
)
|
express_data |
a character vector giving names of input samples containing read mapping results in BAM format. The input files could be the conventional RNA-seq data or the INPUT aligned MeRIP-seq data, which are used to quantify as the gene expression. |
treated_express_data |
a character vector giving names of input treated samples containing read mapping results in BAM format. The treated samples could be the conventional RNA-seq data or the INPUT aligned MeRIP-seq data, which are used to quantify as the gene expression under treated condition. These files are only provided in differential expression context. |
IP_BAM |
a character vector giving names of IP samples in MeRIP-seq data, which contained read mapping results in BAM format. |
INPUT_BAM |
a character vector giving names of INPUT samples in MeRIP-seq data, which contained read mapping results in BAM format. |
TREATED_IP_BAM |
a character vector giving names of treated IP samples in MeRIP-seq data, which contained read mapping results in BAM format. These files are only provided in differential methylation context. |
TREATED_INPUT_BAM |
a character vector giving names of treated INPUT samples in MeRIP-seq data, which contained read mapping results in BAM format. These files are only provided in differential methylation context. |
annot_type |
a character string specifying an in-built annotation used to quantify each gene's reads count for expression data. It has four possible values including |
species |
a character string specifying the name of species, which could be |
GENE_ANNO_GTF |
A character string giving name of a user-provided annotation file in GTF format. Note that |
isGTFAnnotationFile |
logical indicating whether the annotation provided via the |
mode |
A string, such as |
CV_values |
A decimal number, which is the cutoff for coefficient of variation (CV) used to select high variable peaks. The high variable peaks are defined as CV values are bigger than the cutoff: default |
num_sample_subgroup |
A vector of integer, which specifies the number of samples for each sub-group. For example, |
TXDB |
An optional TxDb object for gene annotation information used in the analysis, default: NA. Please refere to "GenomicFeatures" package for more details about the "TxDb" object. |
GENOME |
A string,such as "hg19" or "mm10", which specifies the genome assembly used. If a gene annotation file is provided, this function will not use it directly; otherwise, this function will download the gene annotation from UCSC using the genome assembly specified here and the gene annotation table specified in "UCSC_TABLE_NAME". |
UCSC_TABLE_NAME |
A string, which gives the gene annotation used from UCSC, default: "knownGene". Please use function: supportedUCSCtables() to check available tables. Some tables may not be available for all genomes, and the "refGene" table does not work correctly due to multiple occuences of the same transcript on the same chromosome. |
pvalue |
a decimal number, which specifies the p-value cut-off in the predicting m6A regulated expression gene of m6A-express model. |
FDR |
a decimal number, which specifies the FDR cut-off in the predicting m6A regulated expression gene of m6A-express model, default: 0.05. |
diff_gene_pvalue |
a decimal number, which specifies the p-value cut-off to identify differential expression gene in |
diff_gene_fdr |
a decimal number, which specifies the fdr cut-off to identify differential expression gene in |
diff_peak_pvalue |
a decimal number, which specifies the p-value cut-off to identify differential methylation peak sites in |
diff_peak_fdr |
a decimal number, which specifies the fdr cut-off to identify differential methylation peak sites in |
OUTPUT_DIR |
A string, which specify the output directory, default: OUTPUT_DIR=NA, the output result will save in the current directory. Otherwise, |
In order to analysis the regulation of m6A methylation on gene expression, we develop the R package m6A-express. The inputs of m6A-express are BAM files of RNA-seq data or INPUT BAM files of MeRIP-seq data, which are used to quantify gene expression, and IP BAM files with paired INPUT BAM files that are used to quantify methylation inensity in m6A-express model. For the origianl gene annotation file, such as GTF or TXDB format file, user could provide it or directly use the default in-built annotation file by given the function parameter species
, annot_type
, GENOME
and UCSC_TABLE_NAME = "knownGene"
.
The m6Aexpress
function is an all-in-one command that performs all the core functions of the m6A-express R-package. The core function of m6A_express
achieved two modes to predict methylation regulated expression genes:
1. Predict m6A regulated expression genes under multiple conditions, such as in mutiple tissues context, and we call it basic
mode.
2. Predict the differential methylation peak sites regulated differential expression genes in specific context and we call it DE-DM
mode.
In the basic
mode, m6A-express will output results in specific directory including:
1. Peak calling result botain by exomePeak
R-package, which are detailedly explained in exomePeak
R-package.
2. The paired reads count and methylation intensity for each gene in tab-delimited text file.
3. The predicting m6A methylation regulated expression (m6A-reg-exp) genes under given cut-off in .XLS (tab-delimited) format file. The .XLS file includes five columns: the name of m6A-reg-exp gene, Beta0, Beta1, pvalue and FDR.
gene_name |
The name of m6A methylation regulated expression gene in the given annotation file. |
Beta0 |
Beta0 is a gene-specific intercept and models the baseline log gene expression. |
Beta1 |
Beta1 captures the influence of m6A methylation on gene expression. |
pvalue |
pvalue of the predicted m6A-reg-exp genes. |
FDR |
FDR of the predicted m6A-reg-exp genes. |
In the DE-DM
mode, m6A-express will output results in specific directory including:
1. Consistent peak sites information between two conditions obtained by exomePeak
R-package, which are detailedly explained in exomePeak
R-package.
2. Differential expression (DE) gene as .XLS file on disk in the specified directory given DE cut-off (e.g. diff_gene_pvalue=0.05
or diff_gene_fdr=0.05
).
3. The differential m6A methylation (DM) peak sites information saved as .XLS file in the specified directory given DM cut-off (e.g. diff_peak_pvalue=0.05
or diff_peak_fdr=0.05
).
4. The paired reads count and methylation intensity for each differential expression gene with differential methylation peaks (DE-DM) in tab-delimited text file.
5. The predicting m6A methylation regulated expression (m6A-reg-exp) genes out from DE-DM genes under given cut-off (e.g. pvalue=0.05
or FDR=0.05
) in .XLS (tab-delimited) format file. The .XLS file includes five columns: the name of m6A-reg-exp gene, Beta0, Beta1, pvalue and FDR.
In the HVP
mode, m6A-express will output results in specific directory including:
1. Consistent peak sites information across multiple conditions obtained by exomePeak
R-package, which are detailedly explained in exomePeak
R-package.
2. The High varialbe peak sites information as .xlsx data format.
3. The paired reads count and methylation intensity for each differential expression gene with high variable peaks (HVP) in tab-delimited text file.
4. The predicting m6A methylation regulated expression (m6A-reg-exp) genes out from HVP genes under given cut-off (e.g. pvalue=0.05
or FDR=0.05
) in .XLS (tab-delimited) format file. The .XLS file includes five columns: the name of m6A-reg-exp gene, Beta0, Beta1, pvalue and FDR.
Teng Zhang <tengzhang126@163.com>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | ## Not run:
# Obtain BAM files
f1 <- system.file("extdata", "IP1.bam", package="m6Aexpress")
f2 <- system.file("extdata", "IP2.bam", package="m6Aexpress")
f3 <- system.file("extdata", "IP3.bam", package="m6Aexpress")
f4 <- system.file("extdata", "IP4.bam", package="m6Aexpress")
f5 <- system.file("extdata", "Input1.bam", package="m6Aexpress")
f6 <- system.file("extdata", "Input2.bam", package="m6Aexpress")
f7 <- system.file("extdata", "Input3.bam", package="m6Aexpress")
f8 <- system.file("extdata", "Input4.bam", package="m6Aexpress")
# Input the annotation file
gtf <- system.file("extdata", "hg19toy.gtf", package="m6Aexpress")
# In model="DE-DM"
IP_BAM <- c(f1,f2)
TREATED_IP_BAM <- c(f3,f4)
INPUT_BAM <- c(f5,f6)
TREATED_INPUT_BAM <- c(f7,f8)
## Predict m6A-reg-exp genes out from DE-DM genes.
m6A_reg_exp_gene <- m6Aexpress(express_data=INPUT_BAM, treated_express_data=TREATED_INPUT_BAM, IP_BAM=IP_BAM, TREATED_IP_BAM=TREATED_IP_BAM, INPUT_BAM=INPUT_BAM, TREATED_INPUT_BAM=TREATED_INPUT_BAM,annot_type="hg19", GENE_ANNO_GTF=gtf,isGTFAnnotationFile=TRUE, pvalue=0.05,mode="DE-DM")
## End(Not run)
## Not run:
# In model="HVP"
IP_BAM <- c(f1,f2,f3,f4)
INPUT_BAM <- c(f5,f6,f7,f8)
# Predict m6A-reg-exp gene by m6A-express model in "HVP" context
m6A_reg_exp_gene <- m6Aexpress(express_data=INPUT_BAM, IP_BAM=IP_BAM, INPUT_BAM=INPUT_BAM, annot_type="hg19", GENE_ANNO_GTF=gtf,isGTFAnnotationFile=TRUE, pvalue=0.05,mode="HVP", CV_values = 0.3, num_sample_subgroup=c(2,2))
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.