anomSegStats: Calculate LRR and BAF statistics for anomalous segments

View source: R/anomSegStats.R

anomSegStatsR Documentation

Calculate LRR and BAF statistics for anomalous segments

Description

Calculate LRR and BAF statistics for anomalous segments and plot results

Usage

anomSegStats(intenData, genoData, snp.ids, anom, centromere,
  lrr.cut = -2, verbose = TRUE)

anomStatsPlot(intenData, genoData, anom.stats, snp.ineligible,
  plot.ineligible = FALSE, centromere = NULL,
  brackets = c("none", "bases", "markers"), brkpt.pct = 10,
  whole.chrom = FALSE, win = 5, win.calc = FALSE, win.fixed = 1,
  zoom = c("both", "left", "right"), main = NULL, info = NULL,
  ideogram = TRUE, ideo.zoom = FALSE, ideo.rect = TRUE,
  mult.anom = FALSE, cex = 0.5, cex.leg = 1.5,
colors = c("default", "neon", "primary"), ...)

Arguments

intenData

An IntensityData object containing BAlleleFreq and LogRRatio. The order of the rows of intenData and the snp annotation are expected to be by chromosome and then by position within chromosome.

genoData

A GenotypeData object. The order of the rows of intenData and the snp annotation are expected to be by chromosome and then by position within chromosome.

snp.ids

vector of eligible SNP ids. Usually exclude failed and intensity-only SNPS. Also recommended to exclude an HLA region on chromosome 6 and XTR region on X chromosome. See HLA and pseudoautosomal. If there are SNPs annotated in the centromere gap, exclude these as well (see centromeres). x

anom

data.frame of detected chromosome anomalies. Names must include "scanID", "chromosome", "left.index", "right.index", "sex", "method", "anom.id". Valid values for "method" are "BAF" or "LOH" referring to whether the anomaly was detected by BAF method (anomDetectBAF) or by LOH method (anomDetectLOH). Here "left.index" and "right.index" are row indices of intenData with left.index < right.index.

centromere

data.frame with centromere position info. Names must include "chrom", "left.base", "right.base". Valid values for "chrom" are 1:22, "X", "Y", "XY". Here "left.base" and "right.base" are start and end base positions of the centromere location, respectively. Centromere data tables are provided in centromeres.

lrr.cut

count the number of eligible LRR values less than lrr.cut

verbose

whether to print the scan id currently being processed

anom.stats

data.frame of chromosome anomalies with statistics, usually the output of anomSegStats. Names must include "anom.id", "scanID", "chromosome", "left.index", "right.index", "method", "nmark.all", "nmark.elig", "left.base", "right.base", "nbase", "non.anom.baf.med", "non.anom.lrr.med", "anom.baf.dev.med", "anom.baf.dev.5", "anom.lrr.med", "nmark.baf", "nmark.lrr". Left and right refer to start and end, respectively, of the anomaly, in position order.

snp.ineligible

vector of ineligible snp ids (e.g., intensity-only, failed snps, XTR and HLA regions). See HLA and pseudoautosomal.

plot.ineligible

whether or not to include ineligible points in the plot for LogRRatio

brackets

type of brackets to plot around breakpoints - none, use base length, use number of markers (note that using markers give asymmetric brackets); could be used, along with brkpt.pct, to assess general accuracy of end points of the anomaly

brkpt.pct

percent of anomaly length in bases (or number of markers) for width of brackets

whole.chrom

logical to plot the whole chromosome or not (overrides win and zoom)

win

size of the window (a multiple of anomaly length) surrounding the anomaly to plot

win.calc

logical to calculate window size from anomaly length; overrides win and gives window of fixed length given by win.fixed

win.fixed

number of megabases for window size when win.calc=TRUE

zoom

indicates whether plot includes the whole anomaly ("both") or zooms on just the left or right breakpoint; "both" is default

main

Vector of titles for upper (LRR) plots. If NULL, titles will include anom.id, scanID, sex, chromosome, and detection method.

info

character vector of extra information to include in the main title of the upper (LRR) plot

ideogram

logical for whether to plot a chromosome ideogram under the BAF and LRR plots.

ideo.zoom

logical for whether to zoom in on the ideogram to match the range of the BAF/LRR plots

ideo.rect

logical for whether to draw a rectangle on the ideogram indicating the range of the BAF/LRR plots

mult.anom

logical for whether to plot multiple anomalies from the same scan-chromosome pair on a single plot. If FALSE (default), each anomaly is shown on a separate plot.

cex

cex value for points on the plots

cex.leg

cex value for the ideogram legend

colors

Color scheme to use for genotypes. "default" is colorblind safe (colorbrewer Set2), "neon" is bright orange/green/fuschia, and "primary" is red/green/blue.

...

Other parameters to be passed directly to plot.

Details

anomSegStats computes various statistics of the input anomalies. Some of these are basic statistics for the characteristics of the anomaly and for measuring deviation of LRR or BAF from expected. Other statistics are used in downstrean quality control analysis, including detecting terminal anomalies and investigating centromere-spanning anomalies.

anomStatsPlot produces separate png images of each anomaly in anom.stats. Each image consists of an upper plot of LogRRatio values and a lower plot of BAlleleFrequency values for a zoomed region around the anomaly or whole chromosome (depending up parameter choices). Each plot has vertical lines demarcating the anomaly and horizontal lines displaying certain statistics from anomSegStats. The upper plot title includes sample number and chromosome. Further plot annotation describes which anomaly statistics are represented.

Value

anomSegStats produces a data.frame with the variables for anom plus the following columns: Left and right refer to position order with left < right.

nmark.all

total number of SNP markers on the array from left.index to right.index inclusive

nmark.elig

total number of eligible SNP markers on the array from left.index to right.index, inclusive. See snp.ids for definition of eligible SNP markers.

left.base

base position corresponding to left.index

right.base

base position corresponding to right.index

nbase

number of bases from left.index to right.index, inclusive

non.anom.baf.med

BAF median of non-anomalous segments on all autosomes for the associated sample, using eligible heterozygous or missing SNP markers

non.anom.lrr.med

LRR median of non-anomalous segments on all autosomes for the associated sample, using eligible SNP markers

non.anom.lrr.mad

MAD for LRR of non-anomalous segments on all autosomes for the associated sample, using eligible SNP markers

anom.baf.dev.med

BAF median of deviations from non.anom.baf.med of points used to detect anomaly (eligible and heterozygous or missing)

anom.baf.dev.5

median of BAF deviations from 0.5, using eligible heterozygous or missing SNP markers in anomaly

anom.baf.dev.mean

mean of BAF deviations from non.anom.baf.med, using eligible heterozygous or missing SNP markers in anomaly

anom.baf.sd

standard deviation of BAF deviations from non.anom.baf.med, using eligible heterozygous or missing SNP markers in anomaly

anom.baf.mad

MAD of BAF deviations from non.anom.baf.med, using eligible heterozygous or missing SNP markers in anomaly

anom.lrr.med

LRR median of eligible SNP markers within the anomaly

anom.lrr.sd

standard deviation of LRR for eligible SNP markers within the anomaly

anom.lrr.mad

MAD of LRR for eligible SNP markers within the anomaly

nmark.baf

number of SNP markers within the anomaly eligible for BAF detection (eligible markers that are heterozygous or missing)

nmark.lrr

number of SNP markers within the anomaly eligible for LOH detection (eligible markers)

cent.rel

position relative to centromere - left, right, span

left.most

T/F for whether the anomaly is the left-most anomaly for this sample-chromosome, i.e. no other anomalies with smaller start base position

right.most

T/F whether the anomaly is the right-most anomaly for this sample-chromosome, i.e. no other anomalies with larger end base position

left.last.elig

T/F for whether the anomaly contains the last eligible SNP marker going to the left (decreasing position)

right.last.elig

T/F for whether the anomaly contains the last eligible SNP marker going to the right (increasing position)

left.term.lrr.med

median of LRR for all eligible SNP markers from left-most eligible marker to the left telomere (only calculated for the most distal anom)

right.term.lrr.med

median of LRR for all eligible markers from right-most eligible marker to the right telomere (only calculated for the most distal anom)

left.term.lrr.n

sample size for calculating left.term.lrr.med

right.term.lrr.n

sample size for calculating right.term.lrr.med

cent.span.left.elig.n

number of eligible markers on the left side of centromere-spanning anomalies

cent.span.right.elig.n

number of eligible markers on the right side of centromere-spanning anomalies

cent.span.left.bases

length of anomaly (in bases) covered by eligible markers on the left side of the centromere

cent.span.right.bases

length of anomaly (in bases) covered by eligible markers on the right side of the centromere

cent.span.left.index

index of eligible marker left-adjacent to centromere; recall that index refers to row indices of intenData

cent.span.right.index

index of elig marker right-adjacent to centromere

bafmetric.anom.mean

mean of BAF-metric values within anomaly, using eligible heterozygous or missing SNP markers BAF-metric values were used in the detection of anomalies. See anomDetectBAF for definition of BAF-metric

bafmetric.non.anom.mean

mean of BAF-metric values within non-anomalous segments across all autosomes for the associated sample, using eligible heterozygous or missing SNP markers

bafmetric.non.anom.sd

standard deviation of BAF-metric values within non-anomalous segments across all autosomes for the associated sample, using eligible heterozygous or missing SNP markers

nmark.lrr.low

number of eligible markers within anomaly with LRR values less than lrr.cut

Note

The non-anomalous statistics are computed over all autosomes for the sample associated with an anomaly. Therefore the accuracy of these statistics relies on the input anomaly data.frame including all autosomal anomalies for a given sample.

Author(s)

Cathy Laurie

See Also

anomDetectBAF, anomDetectLOH

Examples

library(GWASdata)
data(illuminaScanADF, illuminaSnpADF)

blfile <- system.file("extdata", "illumina_bl.gds", package="GWASdata")
bl <- GdsIntensityReader(blfile)
blData <-  IntensityData(bl, scanAnnot=illuminaScanADF, snpAnnot=illuminaSnpADF)

genofile <- system.file("extdata", "illumina_geno.gds", package="GWASdata")
geno <- GdsGenotypeReader(genofile)
genoData <-  GenotypeData(geno, scanAnnot=illuminaScanADF, snpAnnot=illuminaSnpADF)

scan.ids <- illuminaScanADF$scanID[1:2]
chrom.ids <- unique(illuminaSnpADF$chromosome)
snp.ids <- illuminaSnpADF$snpID[illuminaSnpADF$missing.n1 < 1]
snp.failed <- illuminaSnpADF$snpID[illuminaSnpADF$missing.n1 == 1]

# example results from anomDetectBAF
baf.anoms <- data.frame("scanID"=rep(scan.ids[1],2), "chromosome"=rep(21,2),
  "left.index"=c(100,300), "right.index"=c(200,400), sex=rep("M",2),
  method=rep("BAF",2), anom.id=1:2, stringsAsFactors=FALSE)

# example results from anomDetectLOH
loh.anoms <- data.frame("scanID"=scan.ids[2],"chromosome"=22,
  "left.index"=400,"right.index"=500, sex="F", method="LOH",
  anom.id=3, stringsAsFactors=FALSE)

anoms <- rbind(baf.anoms, loh.anoms)
data(centromeres.hg18)
stats <- anomSegStats(blData, genoData, snp.ids=snp.ids, anom=anoms,
  centromere=centromeres.hg18)

anomStatsPlot(blData, genoData, anom.stats=stats,
  snp.ineligible=snp.failed, centromere=centromeres.hg18)

close(blData)
close(genoData)

smgogarten/GWASTools documentation built on Nov. 10, 2024, 9:54 p.m.