BAF.transform: Transform BAF into mBAF
In Piet: DNA CNV analysis tools based on fused lasso type of model

Description Usage Arguments Details Value Author(s) References Examples

This function is dedicated to transform BAF value into mirrored BAF (mBAF) value. Non-informative SNPs for CNV inference have been removed, while missing values for those removed SNPs are initialized with the average of nearest SNPs.

1 2	BAF.transform(x, gt = NULL, mBAF.thd = 0.97, win.thd = 0.8, w = 1, k = 2, median.adjust = FALSE)

`x`	A vector of BAF values to be transformed.
`gt`	In tumor data set, if the tumor sample under investigation has matched normal tissue sample, `gt` indicates the vector of the genotypes of SNPs in matched normal sample. If no such information can be supplied, it is set `NULL` as default.
`mBAF.thd`	A criteria to remove non-informative SNPs if no information from matched normal tissue is supplied. See reference for more details.
`win.thd`	A further criteria to remove possible non-informative SNPs which might pass the `mBAF.thd` criteria. See reference for more details.
`w`	The window size used in computation of a quantity to be compared with `win.thd`. The default is 1. See reference for more details.
`k`	The number of nearest SNPs used to computed the initialized values of removed non-informative SNPs.
`median.adjust`	Logical. If it is `TRUE`, the median of BAF value in between 0.25 and 0.75 will be adjusted to 0.5 first before any transformation applied.

More details about the transformation are referred to Staaf J., et al. (2008). The missing values for removed non-informative SNPs are initialized with the average of k-nearest SNPs plus a normal random noise in order to eliminate the dependence of adjacent SNPs.

All returned information is collected into a list

`mBAF`	A vector of mirrored BAF values. Missing values of removed non-informative SNPs are initialized for downstream analysis.
`idx`	A vector of indices of those informative SNPs with values remaining after transformation.
`idx.na`	A vector of indices of those non-informative SNPs with orignal values removed.

Zhongyang (Thomas) Zhang, zhangzy@ucla.edu

Staaf J., et al. (2008) Segmentation-based detection of allelic imbalance and loss-of-heterozygosity in cancer cells using whole genome SNP arrays. Genome Biology, 9: R136+.

## simulate a sequence of BAF values for 100 SNPs
xf <- sample(x=c(0,0.5,1),size=100,replace=TRUE,prob=c(0.25,0.5,0.25)) + rnorm(100,0,0.02)
xf[xf<0] <- 0
xf[xf>1] <- 1
## insert the signal pattern of a duplcation in the middle of x1
xm <- sample(x=c(0,1),size=20,replace=TRUE,prob=c(0.5,0.5)) + rnorm(20,0,0.02)
xm[xm<0] <- 0
xm[xm>1] <- 1
xf[41:60] <- 2/3*xf[41:60] + 1/3*xm
BAF <- xf
plot(BAF,xlab="SNP",ylab="BAF")

## tranform BAF to mBAF
res <- BAF.transform(x=BAF, gt = NULL, mBAF.thd = 0.97, win.thd = 0.8, 
              w = 1, k = 2, median.adjust = FALSE)
plot(res$mBAF,type="n",xlab="SNP",ylab="mBAF")
points(res$idx,res$mBAF[res$idx])
points(res$idx.na,res$mBAF[res$idx.na],col="gray")