getSingleMoleculeMatrices | R Documentation |
Input requires a sampleTable with two columns: FileName contains the path to a bam file of aligned sequences. SampleName contains the name of the sample. The function will return a table of all matrix-files for all samples and all regions listed in regionGRs, together with some info about the matrices. Matrices contain values between 0 (not methylated) and 1 (methylated), or NA (undefined)
getSingleMoleculeMatrices( sampleTable, genomeFile, regionGRs, regionType, genomeMotifGR, minConversionRate = 0.8, maxNAfraction = 0.2, bedFilePrefix = NULL, path = ".", convRatePlots = FALSE, nThreads = 1, samtoolsPath = "", overwriteMatrixLog = FALSE )
sampleTable |
Table with FileName column listing the full path to bam files belonging to the samples listed in the SampleName column |
genomeFile |
String with path to fasta file with genome sequence |
regionGRs |
A genomic regions object with all regions for which matrices should be extracted. The metadata columns must contain a column called "ID" with a unique ID for that region. |
regionType |
A collective name for this list of regions (e.g TSS or amplicons) |
genomeMotifGR |
A GenomicRanges object with a unique set of non-overlapping CG, GC and GCGorCGC sites |
minConversionRate |
Minimal fraction of Cs from a non-methylation context that must be converted to Ts for the read to be included in the final matrix (default=0.8) |
maxNAfraction |
Maximual fraction of CpG/GpC positions that can be undefined (default=0.2) Note that since making the matrix is a time-consuming process, the reads are not removed from the matrix using this threshold, so that a different threshold can be used at a later time, but it is used to give an idea of the read quality in the summary table of the matrices. |
bedFilePrefix |
The full path and prefix of the bed file for C, G, CG and GC positions in the genome (i.e path and name of the file without the ".C.bed",".G.bed", ".CG.bed" or ".GC.bed" suffix). Defulat is NULL and assumes the bed file are in the same location as the genome sequence file. |
path |
Path for output. "plots", "csv" and "rds" directories will be created here. Default is current directory. |
convRatePlots |
Boolean value: should bisulfite conversion rate plots be created for each region? (default=FALSE) |
nThreads |
number of threads for parallelisation |
samtoolsPath |
Path to samtools executable (default="") if not in unix $PATH |
overwriteMatrixLog |
Should matrixLog file be overwritten (in case of change in analysis or data), or should already computed matrices be used and script skips to next matrix (in case of premature termination of analysis) (default=FALSE) |
A list (by sample) of lists (by regions) of methylation matrices
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.