Description Usage Arguments Details Value Note Author(s) References See Also Examples
View source: R/anomDetectBAF.R
anomSegmentBAF
for each sample and chromosome, breaks the chromosome up into
segments marked by change points of a metric based on B Allele Frequency (BAF) values.
anomFilterBAF
selects segments which are likely to be anomalous.
anomDetectBAF
is a wrapper to run anomSegmentBAF
and
anomFilterBAF
in one step.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | anomSegmentBAF(intenData, genoData, scan.ids, chrom.ids, snp.ids,
smooth = 50, min.width = 5, nperm = 10000, alpha = 0.001,
verbose = TRUE)
anomFilterBAF(intenData, genoData, segments, snp.ids, centromere,
low.qual.ids = NULL, num.mark.thresh = 15, long.num.mark.thresh = 200,
sd.reg = 2, sd.long = 1, low.frac.used = 0.1, run.size = 10,
inter.size = 2, low.frac.used.num.mark = 30, very.low.frac.used = 0.01,
low.qual.frac.num.mark = 150, lrr.cut = -2, ct.thresh = 10,
frac.thresh = 0.1, verbose=TRUE,
small.thresh=2.5, dev.sim.thresh=0.1, centSpan.fac=1.25, centSpan.nmark=50)
anomDetectBAF(intenData, genoData, scan.ids, chrom.ids, snp.ids,
centromere, low.qual.ids = NULL, ...)
|
intenData |
An |
genoData |
A |
scan.ids |
vector of scan ids (sample numbers) to process |
chrom.ids |
vector of (unique) chromosomes to process. Should correspond to
integer chromosome codes in |
snp.ids |
vector of eligible snp ids. Usually exclude failed and intensity-only SNPs.
Also recommended to exclude an HLA region on chromosome 6 and
XTR region on X chromosome. See |
smooth |
number of markers for smoothing region. See |
min.width |
minimum number of markers for a segment. See |
nperm |
number of permutations for deciding significance in segmentation.
See |
alpha |
significance level. See |
verbose |
logical indicator whether to print information about the scan id currently being processed. anomSegmentBAF prints each scan id; anomFilterBAF prints a message after every 10 samples: "processing ith scan id out of n" where "ith" with be 10, 10, etc. and "n" is the total number of samples |
segments |
data.frame of segments from |
centromere |
data.frame with centromere position information. Names must include
"chrom", "left.base", "right.base". Valid values for "chrom" are
1:22, "X", "Y", "XY". Here "left.base" and "right.base"
are base positions of start and end of centromere location in position order.
Centromere data tables are provided in |
low.qual.ids |
scan ids determined to be low quality for which some segments are filtered
based on more stringent criteria. Default is NULL. Usual choice are
scan ids for which median BAF across autosomes > 0.05. See
|
num.mark.thresh |
minimum number of SNP markers in a segment to be considered for anomaly |
long.num.mark.thresh |
min number of markers for "long" segment to be considered for anomaly for which significance threshold criterion is allowed to be less stringent |
sd.reg |
number of baseline standard deviations of segment mean from a baseline mean for "normal" needed to declare segment anomalous. This number is given by abs(mean of segment - baseline mean)/(baseline standard deviation) |
sd.long |
same meaning as |
low.frac.used |
if fraction of heterozygous or missing SNP markers compared with number of eligible SNP markers in segment is below this, more stringent criteria are applied to declare them anomalous. |
run.size |
min length of run of missing or heterozygous SNP markers for possible determination of homozygous deletions |
inter.size |
number of homozygotes allowed to "interrupt" run for possible determination of homozygous deletions |
low.frac.used.num.mark |
number of markers threshold for |
very.low.frac.used |
any segments with (num.mark)/(number of markers in interval) less than this are filtered out since they tend to be false positives |
low.qual.frac.num.mark |
minimum num.mark threshold for low quality scans ( |
lrr.cut |
look for runs of LRR values below |
ct.thresh |
minimum number of LRR values below |
frac.thresh |
investigate interval for homozygous deletion only if |
small.thresh |
sd.fac threshold use in making merge decisions involving small num.mark segments |
dev.sim.thresh |
relative error threshold for determining similarity in BAF deviations; used in merge decisions |
centSpan.fac |
thresholds increased by this factor when considering filtering/keeping together left and right halves of centromere spanning segments |
centSpan.nmark |
minimum number of markers under which centromere spanning segments are automatically filtered out |
... |
arguments to pass to |
anomSegmentBAF
uses the function segment
from
the DNAcopy
package to perform circular binary segmentation
on a metric based on BAF values. The metric for a given sample/chromosome
is sqrt(min(BAF,1-BAF,abs(BAF-median(BAF))) where the median is
across BAF values on the chromosome. Only BAF values for heterozygous or
missing SNPs are used.
anomFilterBAF
determines anomalous segments based on a combination
of thresholds for number of SNP markers in the segment and on deviation from
a "normal" baseline. (See num.mark.thresh
,long.num.mark.thresh
,
sd.reg
, and sd.long
.) The "normal" baseline metric mean and standard deviation
are found across all autosomes not segmented by anomSegmentBAF
. This is why
it is recommended to include all autosomes for the argument chrom.ids
to
ensure a more accurate baseline.
Some initial filtering is done,
including possible merging of consecutive segments meeting sd.reg
threshold along with other criteria (such as not spanning the centromere)
and adjustment for accurate
break points for possible homozygous deletions (see lrr.cut
,
ct.thresh
, frac.thresh
, run.size
, and inter.size
).
Male samples for X chromosome are not processed.
More stringent criteria are applied to some segments
(see low.frac.used
,low.frac.used.num.mark
,
very.low.frac.used
, low.qual.ids
, and
low.qual.frac.num.mark
).
anomDetectBAF
runs anomSegmentBAF
with default values and
then runs anomFilterBAF
. Additional parameters for
anomFilterBAF
may be passed as arguments.
anomSegmentBAF
returns a data.frame with the following elements: Left and right
refer to start and end of anomaly, respectively, in position order.
scanID |
integer id of scan |
chromosome |
chromosome as integer code |
left.index |
row index of intenData indicating left endpoint of segment |
right.index |
row index of intenData indicating right endpoint of segment |
num.mark |
number of heterozygous or missing SNPs in the segment |
seg.mean |
mean of the BAF metric over the segment |
anomFilterBAF
and anomDetectBAF
return a list with the
following elements:
raw |
data.frame of raw segmentation data, with same output as
|
filtered |
data.frame of the segments identified as anomalies, with the same columns as
|
base.info |
data frame with columns:
|
seg.info |
data frame with columns:
|
It is recommended to include all autosomes as input. This ensures a more accurate determination of baseline information.
Cecelia Laurie
See references in segment
in the package DNAcopy.
The BAF metric used is modified from Itsara,A., et.al (2009) Population
Analysis of Large Copy Number Variants and Hotspots of Human Genetic Disease.
American Journal of Human Genetics, 84, 148–161.
segment
and smooth.CNA
in the package DNAcopy,
also findBAFvariance
, anomDetectLOH
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | library(GWASdata)
data(illuminaScanADF, illuminaSnpADF)
blfile <- system.file("extdata", "illumina_bl.gds", package="GWASdata")
bl <- GdsIntensityReader(blfile)
blData <- IntensityData(bl, scanAnnot=illuminaScanADF, snpAnnot=illuminaSnpADF)
genofile <- system.file("extdata", "illumina_geno.gds", package="GWASdata")
geno <- GdsGenotypeReader(genofile)
genoData <- GenotypeData(geno, scanAnnot=illuminaScanADF, snpAnnot=illuminaSnpADF)
# segment BAF
scan.ids <- illuminaScanADF$scanID[1:2]
chrom.ids <- unique(illuminaSnpADF$chromosome)
snp.ids <- illuminaSnpADF$snpID[illuminaSnpADF$missing.n1 < 1]
seg <- anomSegmentBAF(blData, genoData, scan.ids=scan.ids,
chrom.ids=chrom.ids, snp.ids=snp.ids)
# filter segments to detect anomalies
data(centromeres.hg18)
filt <- anomFilterBAF(blData, genoData, segments=seg, snp.ids=snp.ids,
centromere=centromeres.hg18)
# alternatively, run both steps at once
anom <- anomDetectBAF(blData, genoData, scan.ids=scan.ids, chrom.ids=chrom.ids,
snp.ids=snp.ids, centromere=centromeres.hg18)
close(blData)
close(genoData)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.