anomDetectLOH: LOH Method for Chromosome Anomaly Detection
In GWASTools: Tools for Genome Wide Association Studies

Description Usage Arguments Details Value Author(s) References See Also Examples

anomDetectLOH breaks a chromosome up into segments of homozygous runs of SNP markers determined by change points in Log R Ratio and selects segments which are likely to be anomalous.

anomDetectLOH(intenData, genoData, scan.ids, chrom.ids, snp.ids,
  known.anoms, smooth = 50, min.width = 5, nperm = 10000, alpha = 0.001,
  run.size = 50, inter.size = 4, homodel.min.num = 10, homodel.thresh = 10,
  small.num = 20, small.thresh = 2.25, medium.num = 50, medium.thresh = 2,
  long.num = 100, long.thresh = 1.5, small.na.thresh = 2.5,
  length.factor = 5, merge.fac = 0.85, min.lrr.num = 20, verbose = TRUE)

`intenData`	An `IntensityData` object containing the Log R Ratio. The order of the rows of intenData and the snp annotation are expected to be by chromosome and then by position within chromosome. The scan annotation should contain sex, coded as "M" for male and "F" for female.
`genoData`	A `GenotypeData` object. The order of the rows of genoData and the snp annotation are expected to be by chromosome and then by position within chromosome.
`scan.ids`	vector of scan ids (sample numbers) to process
`chrom.ids`	vector of (unique) chromosomes to process. Should correspond to integer chromosome codes in `intenData`. Recommended for use with autosomes, X (males will be ignored), and the pseudoautosomal (XY) region.
`snp.ids`	vector of eligible snp ids. Usually exclude failed and intensity-only snps. Also recommended to exclude an HLA region on chromosome 6 and XTR region on X chromosome. See `HLA` and `pseudoautosomal`. If there are SNPs annotated in the centromere gap, exclude these as well (see `centromeres`).
`known.anoms`	data.frame of known anomalies (usually from `anomDetectBAF`); must have "scanID","chromosome","left.index","right.index". Here "left.index" and "right.index" are row indices of intenData. Left and right refer to start and end of anomaly, respectively, in position order.
`smooth`	number of markers for smoothing region. See `smooth.CNA` in the DNAcopy package.
`min.width`	minimum number of markers for segmenting. See `segment` in the DNAcopy package.
`nperm`	number of permutations. See `segment` in the DNAcopy package.
`alpha`	significance level. See `segment` in the DNAcopy package.
`run.size`	number of markers to declare a 'homozygous' run (here 'homozygous' includes homozygous and missing)
`inter.size`	number of consecutive heterozygous markers allowed to interrupt a 'homozygous' run
`homodel.min.num`	minimum number of markers to detect extreme difference in lrr (for homozygous deletion)
`homodel.thresh`	threshold for measure of deviation from non-anomalous needed to declare segment a homozygous deletion.
`small.num`	minimum number of SNP markers to declare segment as an anomaly (other than homozygous deletion)
`small.thresh`	threshold for measure of deviation from non-anomalous to declare segment anomalous if number of SNP markers is between `small.num` and `medium.num`.
`medium.num`	threshold for number of SNP markers to identify 'medium' size segment
`medium.thresh`	threshold for measure of deviation from non-anomalous needed to declare segment anomalous if number of SNP markers is between `medium.num` and `long.num`.
`long.num`	threshold for number of SNP markers to identify 'long' size segment
`long.thresh`	threshold for measure of deviation from non-anomalous when number of markers is bigger than `long.num`
`small.na.thresh`	threshold measure of deviation from non-anomalous when number of markers is between `small.num` and `medium.num` and 'local mad.fac' is NA. See Details section for definition of 'local mad.fac'.
`length.factor`	window around anomaly defined as `length.factor`*(no. of markers in segment) on either side of the given segment. Used in determining 'local mad.fac'. See Details section.
`merge.fac`	threshold for 'sd.fac'= number of baseline standard deviations of segment mean from baseline mean; consecutive segments with 'sd.fac' above threshold are merged
`min.lrr.num`	if any 'non-anomalous' interval has fewer markers than `min.lrr.num`, interval is ignored in finding non-anomalous baseline unless it's the only piece left
`verbose`	logical indicator whether to print the scan id currently being processed

Detection of anomalies with loss of heterozygosity accompanied by change in Log R Ratio. Male samples for X chromosome are not processed.

Circular binary segmentation (CBS) (using the R-package DNAcopy) is applied to LRR values and, in parallel, runs of homozygous or missing genotypes of a certain minimal size (run.size) (and allowing for some interruptions by no more than inter.size heterozygous SNPs ) are identified. Intervals from known.anoms are excluded from the identification of runs. After some possible merging of consecutive CBS segments (based on satisfying a threshold merge.fac for deviation from non-anomalous baseline), the homozygous runs are intersected with the segments from CBS.

Determination of anomalous segments is based on a combination of number-of-marker thresholds and deviation from a non-anomalous baseline. Segments are declared anomalous if deviation from non-anomalous is above corresponding thresholds. (See small.num, small.thresh, medium.num,medium.thresh, long.num,long.thresh,and small.na.thresh.) Non-anomalous median and MAD are defined for each sample-chromosome combination. Intervals from known.anoms and the homozygous runs identified are excluded; remaining regions are the non-anomalous baseline.

Deviation from non-anomalous is measured by a combination of a chromosome-wide 'mad.fac' and a 'local mad.fac' (both the average and the minimum of these two measures are used). Here 'mad.fac' is (segment median-non-anomalous median)/(non-anomalous MAD) and 'local mad.fac' is the same definition except the non-anomalous median and MAD are computed over a window including the segment (see length.factor). Median and MADare found for eligible LRR values.

A list with the following elements:

`raw`	raw homozygous run data, not including any regions present in `known.anoms`. A data.frame with the following columns: Left and right refer to start and end of anomaly, respectively, in position order. `left.index`: row index of intenData indicating left endpoint of segment `right.index`: row index of intenData indicating right endpoint of segment `left.base`: base position of left endpoint of segment `right.base`: base position of right endpoint of segment `scanID`: integer id of scan `chromosome`: chromosome as integer code
`raw.adjusted`	data.frame of runs after merging and intersecting with CBS segments, with the following columns: Left and right refer to start and end of anomaly, respectively, in position order. `scanID`: integer id of scan `chromosome`: chromosome as integer code `left.index`: row index of intenData indicating left endpoint of segment `right.index`: row index of intenData indicating right endpoint of segment `left.base`: base position of left endpoint of segment `right.base`: base position of right endpoint of segment `num.mark`: number of eligible SNP markers in segment `seg.median`: median of eligible LRR values in segment `seg.mean`: mean of eligible LRR values in segment `mad.fac`: measure of deviation from non-anomalous baseline, equal to abs(median of segment - baseline median)/(baseline MAD); used in determining anomalous segments `sd.fac`: measure of deviation from non-anomalous baseline, equal to abs(mean of segment - baseline mean)/(baseline standard deviation); used in determining whether to merge `local`: measure of deviation from non-anomalous baseline used equal to abs(median of segment - local baseline median)/(local baseline MAD); local baseline consists of eligible LRR values in a window around segment; used in determining anomalous segments `num.segs`: number of segments found by CBS for the given chromosome `chrom.nonanom.mad`: MAD of eligible LRR values in non-anomalous regions across the chromosome `chrom.nonanom.median`: median of eligible LRR values in non-anomalous regions across the chromosome `chrom.nonanom.mean`: mean of eligible LRR values in non-anomalous regions across the chromosome `chrom.nonanom.sd`: standard deviation of eligible LRR values in non-anomalous regions across the chromosome `sex`: sex of the scan id coded as "M" or "F"
`filtered`	data.frame of the segments identified as anomalies. Columns are the same as in `raw.adjusted`.
`base.info`	data.frame with columns: `chrom.nonanom.mad`: MAD of eligible LRR values in non-anomalous regions across the chromosome `chrom.nonanom.median`: median of eligible LRR values in non-anomalous regions across the chromosome `chrom.nonanom.mean`: mean of eligible LRR values in non-anomalous regions across the chromosome `chrom.nonanom.sd`: standard deviation of eligible LRR values in non-anomalous regions across the chromosome `sex`: sex of the scan id coded as "M" or "F" `num.runs`: number of original homozygous runs found for given scan/chromosome `num.segs`: number of segments for given scan/chromosome produced by CBS `scanID`: integer id of scan `chromosome`: chromosome as integer code `sex`: sex of the scan id coded as "M" or "F"
`segments`	data.frame of the segmentation found by CBS with columns: `scanID`: integer id of scan `chromosome`: chromosome as integer code `left.index`: row index of intenData indicating left endpoint of segment `right.index`: row index of intenData indicating right endpoint of segment `left.base`: base position of left endpoint of segment `right.base`: base position of right endpoint of segment `num.mark`: number of eligible SNP markers in the segment `seg.mean`: mean of eligible LRR values in the segment `sd.fac`: measure of deviation from baseline equal to abs(mean of segment - baseline mean)/(baseline standard deviation) where the baseline is over non-anomalous regions
`merge`	data.frame of scan id/chromosome pairs for which merging occurred. `scanID`: integer id of scan `chromosome`: chromosome as integer code

Cecelia Laurie

See references in segment in the package DNAcopy.

segment and smooth.CNA in the package DNAcopy, also findBAFvariance, anomDetectLOH

library(GWASdata)
data(illuminaScanADF, illuminaSnpADF)

blfile <- system.file("extdata", "illumina_bl.gds", package="GWASdata")
bl <- GdsIntensityReader(blfile)
blData <-  IntensityData(bl, scanAnnot=illuminaScanADF, snpAnnot=illuminaSnpADF)

genofile <- system.file("extdata", "illumina_geno.gds", package="GWASdata")
geno <- GdsGenotypeReader(genofile)
genoData <-  GenotypeData(geno, scanAnnot=illuminaScanADF, snpAnnot=illuminaSnpADF)

scan.ids <- illuminaScanADF$scanID[1:2]
chrom.ids <- unique(illuminaSnpADF$chromosome)
snp.ids <- illuminaSnpADF$snpID[illuminaSnpADF$missing.n1 < 1]

# example for known.anoms, would get this from anomDetectBAF
known.anoms <- data.frame("scanID"=scan.ids[1],"chromosome"=21,
  "left.index"=100,"right.index"=200)

LOH.anom <- anomDetectLOH(blData, genoData, scan.ids=scan.ids,
  chrom.ids=chrom.ids, snp.ids=snp.ids, known.anoms=known.anoms)

close(blData)
close(genoData)