optimalBinsize: Assess optimal genomic bin size to partition read counts.
In sdchandra/CNAclinic: A Software Suite for Shallow Sequencing Copy Number Analysis

optimalBinsize

R Documentation

Assess optimal genomic bin size to partition read counts.

Description

Calculate Akaike's information criterion (AIC) and cross-validation (CV) log-likelihood to infer the optimal bin size to partition read counts across genome.

Usage

optimalBinsize(bamfiles = NULL, bamnames = NULL, pathToBams = NULL,
  binSizes = c(10, 30, 50, 100, 250, 500, 750, 1000), measure = "CV",
  lineColor = "red4", chromosomesFilter = c("X", "Y", "M", "MT"),
  savePlot = FALSE, plotPrefix = "optimalBinsize", minMapq = 20,
  isPaired = NA, isProperPair = NA, isUnmappedQuery = FALSE,
  hasUnmappedMate = NA, isMinusStrand = NA, isMateMinusStrand = NA,
  isFirstMateRead = NA, isSecondMateRead = NA, isSecondaryAlignment = NA,
  isDuplicate = FALSE)

Arguments

`bamfiles`	A `character` vector of BAM file names with or without full path. If NULL (default), all files with extension .bam, are read from directory path.
`bamnames`	An optional `character` vector of sample names. Defaults to file names with extension `.bam` removed.
`pathToBams`	If `bamfiles` is NULL, all files ending with ".bam" extension will be read from this path.
`binSizes`	A `numeric` vector of genomic bin sizes, in units of kilo base pairs (1000 base pairs), e.g. `binSizes = c(10, 30, 50)` corresponds to bins of 10, 30 and 50 kbp bins.
`measure`	The goodness of fit criteria (AIC or CV). Defaults to "CV".
`lineColor`	Line color to use in plot.
`chromosomesFilter`	A `character` vector specifying which chromosomes to filter out. Defaults to the sex chromosomes and mitochondrial reads, i.e. `c("X", "Y", "M", "MT")`. Use NA to use all chromosomes.
`savePlot`	if TRUE (default) saves plots of each sample to working directory.
`plotPrefix`	Prefix for plot title and pdf file name. Defaults to "optimalBinsize".
`minMapq`	If quality scores exists, the minimum quality score required in order to keep a read (20, default).
`isPaired`	A `logical`(1) indicating whether unpaired (FALSE), paired (TRUE), or any (NA, default) read should be returned.
`isProperPair`	A `logical`(1) indicating whether improperly paired (FALSE), properly paired (TRUE), or any (NA, default) read should be returned.
`isUnmappedQuery`	A `logical`(1) indicating whether unmapped (TRUE), mapped (FALSE, default), or any (NA) read should be returned.
`hasUnmappedMate`	A `logical`(1) indicating whether reads with mapped (FALSE), unmapped (TRUE), or any (NA, default) mate should be returned.
`isMinusStrand`	A `logical`(1) indicating whether reads aligned to the plus (FALSE), minus (TRUE), or any (NA, default) strand should be returned.
`isMateMinusStrand`	A `logical`(1) indicating whether mate reads aligned to the plus (FALSE), minus (TRUE), or any (NA, default) strand should be returned.
`isFirstMateRead`	A `logical`(1) indicating whether the first mate read should be returned (TRUE) or not (FALSE), or whether mate read number should be ignored (NA, default).
`isSecondMateRead`	A `logical`(1) indicating whether the second mate read should be returned (TRUE) or not (FALSE), or whether mate read number should be ignored (NA, default).
`isSecondaryAlignment`	A `logical`(1) indicating whether alignments that are primary (FALSE), are not primary (TRUE) or whose primary status does not matter (NA, default) should be returned.
`isDuplicate`	A `logical`(1) indicating that un-duplicated (FALSE, default), duplicated (TRUE), or any (NA) reads should be returned.

Details

As a guidance, choose bin sizes which have low AIC and/or high CV values but also contain 30-180 read counts on average. This strikes a reasonable balance between error variability and bias of CNA. Using a much smaller bin size may result in many genomic regions with zero read count and make the overall analysis non-informative. At the other extreme, using a much bigger bin size will 'smooth out' some pattern of alteration (i.e. increasing bias). The process of estimating the optimal bin size is in the context of low-coverage sequence data, so use sensible values for the binSizes argument when the input data is not of shallow whole-genome depth (<10 million reads).

Value

Returns a list. The first element is a data.frame holding information of the average read counts per bin size, the other elements are sample-specific ggplot objects.

Author(s)

Dineika Chandrananda

Examples

     ## Not run: 
      vignette("CNAclinic")
     
## End(Not run)

sdchandra/CNAclinic documentation built on Aug. 8, 2024, 4:08 p.m.

sdchandra/CNAclinic index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

sdchandra/CNAclinic
A Software Suite for Shallow Sequencing Copy Number Analysis

optimalBinsize: Assess optimal genomic bin size to partition read counts.
In sdchandra/CNAclinic: A Software Suite for Shallow Sequencing Copy Number Analysis

Assess optimal genomic bin size to partition read counts.

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Related to optimalBinsize in sdchandra/CNAclinic...

R Package Documentation

Browse R Packages

We want your feedback!

sdchandra/CNAclinic A Software Suite for Shallow Sequencing Copy Number Analysis

optimalBinsize: Assess optimal genomic bin size to partition read counts. In sdchandra/CNAclinic: A Software Suite for Shallow Sequencing Copy Number Analysis

Assess optimal genomic bin size to partition read counts.

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Related to optimalBinsize in sdchandra/CNAclinic...

R Package Documentation

Browse R Packages

We want your feedback!

sdchandra/CNAclinic
A Software Suite for Shallow Sequencing Copy Number Analysis

optimalBinsize: Assess optimal genomic bin size to partition read counts.
In sdchandra/CNAclinic: A Software Suite for Shallow Sequencing Copy Number Analysis