Description Usage Arguments Details Value Author(s) References See Also Examples
hapFabia
: R implementation of the hapFabia method.
hapFabia
extracts short IBD segments tagged by rare variants
from phased or unphased genotypes. hapFabia is designed for rare variants
in very large sequencing data.
The method is based on FABIA biclustering and utilizes the package
fabia.
1 2 3 4 5 6 7 8 | hapFabia(fileName,prefixPath="",sparseMatrixPostfix="_mat",
annotPostfix="_annot.txt",individualsPostfix="_individuals.txt",
labelsA=NULL,pRange="",individuals=0,lowerBP=0,upperBP=0.05,
p=10,iter=40,quant=0.01,eps=1e-5,alpha=0.03,cyc=50,non_negative=1,
write_file=0,norm=0,lap=100.0,IBDsegmentLength=50,Lt = 0.1,
Zt = 0.2,thresCount=1e-5,mintagSNVsFactor=3/4,pMAF=0.03,
haplotypes=FALSE,cut=0.8,procMinIndivids=0.1,thresPrune=1e-3,
simv="minD",minTagSNVs=6,minIndivid=2,avSNVsDist=100,SNVclusterLength=100)
|
fileName |
name of the file that contains the genotype matrix in sparse format. |
prefixPath |
path to the genotype file. |
sparseMatrixPostfix |
postfix string for the sparse matrix. |
annotPostfix |
postfix string for the SNV annotation file. |
individualsPostfix |
postfix string for the file containing the names of the individuals. |
labelsA |
annotation of the individuals as matrix individuals x 4 (individual ID, subpopulation, population, genotyping platform). |
pRange |
indicates which DNA interval is processed. |
individuals |
vector of individuals that are included into the analysis; default = 0 (all individuals). |
lowerBP |
lower bound on minor allele frequencies (MAF); however at least two occurrences are required to remove private SNVs. |
upperBP |
upper bound on minor allele frequencies (MAF) to extract rare variants. |
p |
number of biclusters per fabia iteration. |
iter |
number of fabia iterations. |
quant |
percentage of fabia loadings L that are removed after each iteration. |
eps |
lower bound on fabia variational parameter lapla; default 1e-5. |
alpha |
fabia sparseness of the loadings; default = 0.03. |
cyc |
number of cycles per fabia iteration; default 50. |
non_negative |
non-negative fabia factors and loadings if non_negative = 1; default = 1 (yes). |
write_file |
fabia results are written to files (L in sparse format), default = 0 (not written). |
norm |
fabia data normalization; default 1 (no normalization). |
lap |
minimal value of fabia's variational parameter; default 100.0. |
IBDsegmentLength |
expected typical IBD segment length in kbp. |
Lt |
percentage of largest fabia L values to consider for IBD segment extraction. |
Zt |
percentage of largest fabia Z values to consider for IBD segment extraction. |
thresCount |
p-value of random histogram hit; default 1e-5. |
mintagSNVsFactor |
percentage of the histogram tagSNVs count
threshold ( |
pMAF |
averaged and corrected (for non-uniform distributions) minor allele frequency. |
haplotypes |
|
cut |
cutoff for merging IBD segments after a hierarchical clustering; default 0.8. |
procMinIndivids |
Percentage of cluster individuals a tagSNV must tag to be considered as tagSNV for the IBD segment. |
thresPrune |
Threshold for pruning border tagSNVs based on an exponential distribution where border tagSNVs with large distances to the next tagSNV are pruned. |
simv |
Similarity measure for merging clusters: |
minTagSNVs |
Minimum matching tagSNVs for cluster similarity; otherwise the similarity is set to zero. |
minIndivid |
Minimum matching individuals for cluster similarity; otherwise the similarity is set to zero. |
avSNVsDist |
average distance between SNVs in base pairs - used
together with |
SNVclusterLength |
if |
This function uses a genotype matrix in sparse matrix format and extracts IBD segments by biclustering. First, it performs Fabia biclustering and then extracts local accumulations of loadings. Then it disentangles IBD segments and prunes off spurious correlated SNVs. Finally, it merges similar IBD segments to account for larger IBD segments that were broken during the analysis.
Annotation file ..._annot.txt
for SNVs:
first line: number individuals;
second line: number SNVs;
for each SNV a line containing following field that are blank separated: "chromosome", "physical position", "snvNames", "snvMajor", "snvMinor", "quality", "pass", "info of vcf file", "fields in vcf file", "frequency", "0/1: 1 is changed if major allele is actually minor allele".
labelsA
is a matrix ("number individuals" x 4),
where for each individual following characteristics are given:
id;
subPopulation;
population;
platform.
The probability of observing k or more
correlated SNVs in a histogram bin is computed.
The minimal k which pushes the
probability below the threshold thresCount
is used to find
accumulation of correlated SNVs that are assumed to belong to a
IBD segment.
Let p be the probability of a random minor allele match between t individuals. The probability of observing k or more matches for n SNVs in a histogram bin is given by one minus the binomial distribution F(k;n,p):
1 - F(k-1;n,p) = Pr(K >= k) = SUM(i = k to infinity; (n over i) power(p,i) power(1-p,n-i))
If q is the minor allele frequency (MAF) for one SNV,
the probability p of observing the minor allele of this SNV in
all t individuals is p=power(q,t).
The value q is given by the parameter pMAF
.
Implementation in R.
List containing
mergedIBDsegmentList |
an object of the class
|
res |
the result of FABIA. |
sPF |
individuals per loading of this FABIA result. |
annot |
annotation for the genotype data. |
IBDsegmentList1 |
an object of the class
|
IBDsegmentList2 |
an object of the class
|
mergedIBDsegmentList1 |
an object of the class
|
mergedIBDsegmentList2 |
an object of the class
|
Sepp Hochreiter
S. Hochreiter et al., ‘FABIA: Factor Analysis for Bicluster Acquisition’, Bioinformatics 26(12):1520-1527, 2010.
IBDsegment-class
,
IBDsegmentList-class
,
analyzeIBDsegments
,
compareIBDsegmentLists
,
extractIBDsegments
,
findDenseRegions
,
hapFabia
,
hapFabiaVersion
,
hapRes
,
chr1ASW1000G
,
IBDsegmentList2excel
,
identifyDuplicates
,
iterateIntervals
,
makePipelineFile
,
matrixPlot
,
mergeIBDsegmentLists
,
mergedIBDsegmentList
,
plotIBDsegment
,
res
,
setAnnotation
,
setStatistics
,
sim
,
simu
,
simulateIBDsegmentsFabia
,
simulateIBDsegments
,
split_sparse_matrix
,
toolsFactorizationClass
,
vcftoFABIA
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 | old_dir <- getwd()
setwd(tempdir())
data(simu)
namesL <- simu[["namesL"]]
haploN <- simu[["haploN"]]
snvs <- simu[["snvs"]]
annot <- simu[["annot"]]
alleleIimp <- simu[["alleleIimp"]]
write.table(namesL,file="dataSim1fabia_individuals.txt",
quote = FALSE,row.names = FALSE,col.names = FALSE)
write(as.integer(haploN),file="dataSim1fabia_annot.txt",
ncolumns=100)
write(as.integer(snvs),file="dataSim1fabia_annot.txt",
append=TRUE,ncolumns=100)
write.table(annot,file="dataSim1fabia_annot.txt",
sep = " ",quote = FALSE,row.names = FALSE,
col.names = FALSE,append=TRUE)
write(as.integer(haploN),file="dataSim1fabia_mat.txt",
ncolumns=100)
write(as.integer(snvs),file="dataSim1fabia_mat.txt",
append=TRUE,ncolumns=100)
for (i in 1:haploN) {
a1 <- which(alleleIimp[i,]>0.01)
al <- length(a1)
b1 <- alleleIimp[i,a1]
a1 <- a1 - 1
dim(a1) <- c(1,al)
b1 <- format(as.double(b1),nsmall=1)
dim(b1) <- c(1,al)
write.table(al,file="dataSim1fabia_mat.txt",
sep = " ", quote = FALSE,row.names = FALSE,
col.names = FALSE,append=TRUE)
write.table(a1,file="dataSim1fabia_mat.txt",
sep = " ", quote = FALSE,row.names = FALSE,
col.names = FALSE,append=TRUE)
write.table(b1,file="dataSim1fabia_mat.txt",
sep = " ", quote = FALSE,row.names = FALSE,
col.names = FALSE,append=TRUE)
}
hapRes <- hapFabia(fileName="dataSim1fabia",prefixPath="",
sparseMatrixPostfix="_mat",
annotPostfix="_annot.txt",individualsPostfix="_individuals.txt",
labelsA=NULL,pRange="",individuals=0,lowerBP=0,upperBP=0.15,
p=10,iter=1,quant=0.01,eps=1e-5,alpha=0.03,cyc=50,non_negative=1,
write_file=0,norm=0,lap=100.0,IBDsegmentLength=10,Lt = 0.1,
Zt = 0.2,thresCount=1e-5,mintagSNVsFactor=3/4,pMAF=0.1,
haplotypes=FALSE,cut=0.8,procMinIndivids=0.1,thresPrune=1e-3,
simv="minD",minTagSNVs=6,minIndivid=2,avSNVsDist=100,SNVclusterLength=100)
summary(hapRes$mergedIBDsegmentList)
plot(hapRes$mergedIBDsegmentList[[1]],filename="dataSim1fabia_mat")
### Another Example
simulateIBDsegmentsFabia(fileprefix="dataSim",
minruns=2,maxruns=2,snvs=1000,individualsN=100,
avDistSnvs=100,avDistMinor=10,noImplanted=1,
implanted=10,length=50,minors=30,mismatches=0,
mismatchImplanted=0.5,overlap=50)
hapRes <- hapFabia(fileName="dataSim2fabia",prefixPath="",
sparseMatrixPostfix="_mat",
annotPostfix="_annot.txt",individualsPostfix="_individuals.txt",
labelsA=NULL,pRange="",individuals=0,lowerBP=0,upperBP=0.15,
p=10,iter=1,quant=0.01,eps=1e-5,alpha=0.03,cyc=50,non_negative=1,
write_file=0,norm=0,lap=100.0,IBDsegmentLength=10,Lt = 0.1,
Zt = 0.2,thresCount=1e-5,mintagSNVsFactor=3/4,pMAF=0.1,
haplotypes=FALSE,cut=0.8,procMinIndivids=0.1,thresPrune=1e-3,
simv="minD",minTagSNVs=6,minIndivid=2,avSNVsDist=100,SNVclusterLength=100)
## Summary of the IBD segment list
summary(hapRes$mergedIBDsegmentList)
## Summary of the IBD segment
summary(hapRes$mergedIBDsegmentList[[1]])
## Plot an IBD segment
plot(hapRes$mergedIBDsegmentList[[1]],filename="dataSim2fabia_mat")
## Not run:
## It is interactive, thus dontrun!
## Plot an IBD segment list
plot(hapRes$mergedIBDsegmentList,filename="dataSim2fabia_mat")
## End(Not run)
setwd(old_dir)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.