View source: R/CountPeakReads.R
CountPeakReads | R Documentation |
Generate normalized counts in genomic regions of interest (for example, ChIPseq peaks).
CountPeakReads(
peaks,
bamFiles,
bamNames = bamFiles,
chips = bamNames,
inputs,
width = 500,
minOverlap = 1,
PairedEnd = FALSE,
minMQS = 255,
strand = 0,
splitOnly = FALSE,
nonSplitOnly = FALSE,
readExtension3 = 0,
readShiftSize = 0,
requireBothEndsMapped = FALSE,
read2pos = 5,
pseudocount = 8,
mode = "F",
genome = "BSgenome.Mmusculus.UCSC.mm10"
)
peaks |
A GRanges object containing your regions of interest. Must include seqnames (chromosomes), start, end, strand, and name. |
bamFiles |
Character vector containing the filenames (including the full path) of read alignment files in bam format. |
bamNames |
Character vector containing the names to describe the |
chips |
Character vector containing the names of the ChIP samples, corresponding exactly to the ChIP sample names in |
inputs |
Character vector containing the names of the Input samples corresponding exactly to the Input sample names in |
width |
Integer scalar providing the width around the center of the peaks GRanges object provided in which the reads will be counted,default = 0. If width=0, the provided regions will remain the original size. |
minOverlap |
Integer scalar giving the minimum number of overlapping bases required for assigning a read to a genomic region (provided in |
PairedEnd |
Logical scalar, indicating wether reads in bam files were generated with paired-end or single-end sequencing. Default is FALSE (=single-end). |
minMQS |
Integer scalar, specifying the minimum mapping quality that a read must have to be included. Default is 255, which eliminates multimapping
reads in case the STAR aligner was used to generate the |
strand |
Integer vector indicating if strand-specific read counting should be performed. Length of the vector should be either 1 (meaning that the value is applied to all input files), or equal to the total number of input files provided. Each vector element should have one of the following three values: 0 (unstranded), 1 (stranded) and 2 (reversely stranded). Default value of this parameter is 0 (ie. unstranded read counting is performed for all input files). |
splitOnly |
Logical scalar indicating whether only split alignments (their CIGAR strings contain letter 'N') should be included. FALSE by default. |
nonSplitOnly |
Logical scalar indicating whether only non-split alignments (their CIGAR strings do not contain letter 'N') should be included. FALSE by default. |
readExtension3 |
Integer scalar giving the number of bases extended downstream from 3' end of each read. 0 by default. Negative value is not allowed. |
readShiftSize |
Integer scalar specifying the number of bases the reads will be shifted downstream by. 0 by default. Negative value is not allowed. In case of mode = "Q" and PairedEnd = TRUE, it should be set to 'halfInsert' to shift each read by half the fragment size. |
requireBothEndsMapped |
Logical scalar indicating if both ends from the same fragment are required to be successfully aligned before the fragment can be assigned to a feature or meta-feature. This parameter is only appliable when PairedEnd is TRUE. |
read2pos |
Specifying whether each read should be reduced to its 5' most base or 3' most base. It has three possible values: NULL, 5 (denoting 5' most base) and 3 (denoting 3' most base). Default value is 5, ie. only the 5' end of the read will be counted. If a read is reduced to a single base, only that base will be considered for the read assignment. Read reduction is performed after read shifting and extension. |
pseudocount |
Numeric scalar providing the pseudocount that is added to the read counts of Chip and Input before normalization. Default is 8. |
mode |
Specify if you want to use QuasR (mode = "Q") or FeatureCounts (mode = "F", default) to count the number of reads per region |
genome |
The reference genome, either a string specifying a BSgenome object or the name of a genome fasta file. Default is "BSgenome.Mmusculus.UCSC.mm10". |
Takes regions of interest in the genome (peaks
), for example ChIP peaks, or transcription start sites.
Then the number of reads in each one of the prvovided bamFiles
is counted. The read counts in a ChIP sample (provided in chips
)
are normalized to the read counts in the corresponding Input sample (provided in the same position in inputs
as the corresponding
ChIP sample in chips
), if provided, taking the total library sizes into account.
If no Input sample names are provided, the counts per milliion reads (cpm) for each ChIP sample is returned.
A list of 2 matrices. Both matrices contain a column for each chip sample provided, and the same number of rows as regions provided in the GRanges object.
The first matrix contains the (Input) normalized number of reads in each region (log2 of Chip + pseudocount
/Input + pseudocount
).
If no Input samples are provided, then the first matrix contains the counts per million (cpm) values for the ChIP samples.
The second matrix contains the total number of reads in the the chip sample, or in the chip plus the input sample, if an input sample is provided.
peaks <- SimulatePeaks(1000,rep(100,1000),chromosomeSizes=
system.file("extdata", "chrNameLength_mm10_chr11.txt", package = "MiniChip"))
bamFiles <- list.files(system.file("extdata",
package = "MiniChip"), full.names=TRUE,pattern="*bam$")
bamNames <- gsub(paste(system.file("extdata",
package = "MiniChip"),"/",sep=""),"",bamFiles)
bamNames <- gsub("_chr11.bam","",bamNames)
CountPeakReads(peaks=peaks,bamFiles=bamFiles,
bamNames=bamNames,chips=bamNames[1:3],inputs=bamNames[4:6])
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.