processForSegmentation: Process reads counts from BAM files to prepare input for...
In sdchandra/CNAclinic: A Software Suite for Shallow Sequencing Copy Number Analysis

processForSegmentation

R Documentation

Process reads counts from BAM files to prepare input for segmentation algorithms

Description

processForSegmentation is a wrapper function that reads in BAM files and carries out binning, filtering, bias correcting, smoothing and normalizing of the read counts using functions of the QDNAseq package.

Usage

processForSegmentation(bamfiles = NULL, bamnames = NULL,
  refSamples = NULL, pathToBams = NULL, ext = "bam", binSize = NULL,
  genome = "hg19", outputType = "CNAclinicData",
  typeOfPreMadeBins = "SR50", userMadeBins = NULL,
  cache = getOption("QDNAseq::cache", FALSE), minMapq = 20,
  pairedEnds = NULL, isPaired = NA, isProperPair = NA,
  isUnmappedQuery = FALSE, hasUnmappedMate = NA, isMinusStrand = NA,
  isMateMinusStrand = NA, isFirstMateRead = NA, isSecondMateRead = NA,
  isSecondaryAlignment = NA, isDuplicate = FALSE, residualFilter = TRUE,
  blacklistFilter = TRUE, mappabilityFilter = 15,
  chromosomesFilter = c("X", "Y", "M", "MT"), spanForLoess = 0.65,
  familyForLoess = "symmetric", maxIterForCorrection = 1,
  cutoffForCorrection = 4, variablesForCorrection = c("gc", "mappability"),
  methodOfCorrection = "ratio", methodOfNormalization = "median",
  logTransformForSmoothing = TRUE, skipMedianNormalization = FALSE,
  skipOutlierSmoothing = FALSE, saveCountData = FALSE,
  filename = "corrected_QDNAseqCopyNumbers")

Arguments

`bamfiles`	A `character` vector of BAM file names with or without full path. If NULL (default), all files with extension .bam, are read from directory path.
`bamnames`	An optional `character` vector of sample names. Defaults to file names with extension `.bam` removed. `bamnames` must be provided if `refSamples` is not NULL.
`refSamples`	An optional `character` vector of the reference sample names that are to be used in normalizing each sample in `bamnames`. If not NULL (default), `refSamples` must be the same length as `bamnames` and should only include sample names contained in `bamnames`. See vignette for further details.
`pathToBams`	If `bamfiles` is NULL, all files ending with ".bam" extension will be read from this path. If NULL, defaults to the current working directory.
`ext`	Input files extension. Defaults to "bam".
`binSize`	A `numeric` scalar specifying the width of the bins in units of kbp (1000 base pairs), e.g. `binSize=50` corresponds to 50 kbp bins.
`genome`	Genome build used to align sequencing reads. Currently, CNAclinic only allows `"hg19"` (default). Also see: `userMadeBins`
`outputType`	Return an object of class `"QDNAseqCopyNumbers"` or `"CNAclinicData"` (default).
`typeOfPreMadeBins`	A `character` string to specify the read type (single/paired) and length used to generate pre-made annotation. e.g `"SR50"` (default) or `"PE100"`.
`userMadeBins`	An optional data.frame or an `AnnotatedDataFrame` object containing bin annotations created using the `createBins` function. Consult the QDNAseq vignette for further information.
`cache`	Whether to read and write intermediate cache files, which speeds up subsequent analyses of the same files. Requires packages R.cache and digest (both available on CRAN) to be installed. Defaults to getOption("QDNAseq::cache", FALSE)
`minMapq`	If quality scores exists, the minimum quality score required in order to keep a read (20, default).
`pairedEnds`	A boolean value or vector specifying whether the BAM files contain paired-end data or not.
`isPaired`	A `logical`(1) indicating whether unpaired (FALSE), paired (TRUE), or any (NA, default) read should be returned.
`isProperPair`	A `logical`(1) indicating whether improperly paired (FALSE), properly paired (TRUE), or any (NA, default) read should be returned. A properly paired read is defined by the alignment algorithm and might, e.g., represent reads aligning to identical reference sequences and with a specified distance.
`isUnmappedQuery`	A `logical`(1) indicating whether unmapped (TRUE), mapped (FALSE, default), or any (NA) read should be returned.
`hasUnmappedMate`	A `logical`(1) indicating whether reads with mapped (FALSE), unmapped (TRUE), or any (NA, default) mate should be returned.
`isMinusStrand`	A `logical`(1) indicating whether reads aligned to the plus (FALSE), minus (TRUE), or any (NA, default) strand should be returned.
`isMateMinusStrand`	A `logical`(1) indicating whether mate reads aligned to the plus (FALSE), minus (TRUE), or any (NA, default) strand should be returned.
`isFirstMateRead`	A `logical`(1) indicating whether the first mate read should be returned (TRUE) or not (FALSE), or whether mate read number should be ignored (NA, default).
`isSecondMateRead`	A `logical`(1) indicating whether the second mate read should be returned (TRUE) or not (FALSE), or whether mate read number should be ignored (NA, default).
`isSecondaryAlignment`	A `logical`(1) indicating whether alignments that are primary (FALSE), are not primary (TRUE) or whose primary status does not matter (NA, default) should be returned.
`isDuplicate`	A `logical`(1) indicating that un-duplicated (FALSE, default), duplicated (TRUE), or any (NA) reads should be returned.
`residualFilter`	Either a `logical` specifying whether to filter based on loess residuals of the calibration set or if a numeric, the number of standard deviations to use as the cutoff. Default is TRUE, which corresponds to 4.0 standard deviations.
`blacklistFilter`	Either a `logical` specifying whether to filter based on overlap with ENCODE blacklisted regions, or if numeric, the maximum percentage of overlap allowed. Default is @TRUE, which corresponds to no overlap allowed (i.e. value of 0).
`mappabilityFilter`	A `numeric` in `[0,100]` to specify filtering out bins with mappabilities lower than the number specified (15, default). FALSE will not filter based on mappability.
`chromosomesFilter`	A `character` vector specifying which chromosomes to filter out. Defaults to the sex chromosomes and mitochondrial reads, i.e. `c("X", "Y", "M", "MT")`. Use NA to use all chromosomes.
`spanForLoess`	For @see "stats::loess", the parameter alpha which controls the degree of smoothing.
`familyForLoess`	For @see "stats::loess", if "gaussian" fitting is by least-squares, and if "symmetric" a re-descending M estimator is used with Tukey's biweight function.
`maxIterForCorrection`	An integer(1) specifying the maximum number of iterations to perform, default is 1. If larger, after the first loess fit, bins with median residuals larger than `cutoffForCorrection` are removed, and the fitting repeated until the list of bins to use stabilizes or after `maxIter` iterations.
`cutoffForCorrection`	A numeric(1) specifying the number of standard deviations (as estimated with @see "matrixStats::madDiff") the cutoff for removal of bins with median residuals larger than the cutoff. Not used if `maxIter=1` (default).
`variablesForCorrection`	A character vector specifying which variables to include in the correction. Can be c("gc", "mappability") (the default) or "gc", or "mappability".
`methodOfCorrection`	A `character` string speficying the correction method. `ratio` (default) divides `counts` with `fit`. `median` calculates the median `fit`, and defines the correction for bins with GC content `gc` and mappability `map` as `median(fit) - fit(gc,map)`, which is added to `counts`. Method `none` leaves `counts` untouched.
`methodOfNormalization`	A `character` string specifying the normalization method. Choices are "mean", "median" (default), or "mode".
`logTransformForSmoothing`	If TRUE (default), data will be log2-transformed for smoothing.
`skipMedianNormalization`	Skip this step if TRUE. Recommended when normalizing by refSamples
`skipOutlierSmoothing`	Skip this specific step if TRUE.
`saveCountData`	Save an object of class QDNAseqCopyNumbers after the GC/mappability correction step. default is FALSE
`filename`	Filename to save the before mentioned object.

Value

Returns an object of class CNAclinicData (default) or QDNAseqCopyNumbers

Author(s)

Dineika Chandrananda

Examples

     ## Not run: 
      vignette("CNAclinic")
     
## End(Not run)

sdchandra/CNAclinic documentation built on Aug. 8, 2024, 4:08 p.m.

sdchandra/CNAclinic index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

sdchandra/CNAclinic
A Software Suite for Shallow Sequencing Copy Number Analysis

processForSegmentation: Process reads counts from BAM files to prepare input for...
In sdchandra/CNAclinic: A Software Suite for Shallow Sequencing Copy Number Analysis

Process reads counts from BAM files to prepare input for segmentation algorithms

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Related to processForSegmentation in sdchandra/CNAclinic...

R Package Documentation

Browse R Packages

We want your feedback!

sdchandra/CNAclinic A Software Suite for Shallow Sequencing Copy Number Analysis

processForSegmentation: Process reads counts from BAM files to prepare input for... In sdchandra/CNAclinic: A Software Suite for Shallow Sequencing Copy Number Analysis

Process reads counts from BAM files to prepare input for segmentation algorithms

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Related to processForSegmentation in sdchandra/CNAclinic...

R Package Documentation

Browse R Packages

We want your feedback!

sdchandra/CNAclinic
A Software Suite for Shallow Sequencing Copy Number Analysis

processForSegmentation: Process reads counts from BAM files to prepare input for...
In sdchandra/CNAclinic: A Software Suite for Shallow Sequencing Copy Number Analysis