simulateIBDsegments: Generates simulated genotyping data with IBD segments
In hapFabia: hapFabia: Identification of very short segments of identity by descent (IBD) characterized by rare variants in large sequencing data

Description Usage Arguments Details Value Author(s) References See Also Examples

simulateIBDsegments: R implementation of simulateIBDsegments.

Genotype data with rare variants is simulated. Into these datan IBD segments are implanted. All data sets and information are written to files.

simulateIBDsegments(fileprefix="dataSim",minruns=1,
   maxruns=100,snvs=10000,individualsN=100,avDistSnvs=100,
   avDistMinor=25,noImplanted=1,implanted=10,length=100,
   minors=20,mismatches=0,mismatchImplanted=0.5,overlap=50,
   noOverwrite=FALSE)

`fileprefix`	prefix of file names containing data generated in this simulation.
`minruns`	start index for generating multiple data sets.
`maxruns`	end index for generating multiple data sets.
`snvs`	number of SNVs in this simulation.
`individualsN`	number of individuals in this simulation.
`avDistSnvs`	average genomic distance in bases between SNVs.
`avDistMinor`	average distance between minor alleles, thus `1/avDistMinor` is the average minor allele frequency (MAF).
`noImplanted`	number of IBD segments that are implanted.
`implanted`	number of individuals belonging to specific IBD segment.
`length`	length of the IBD segments in number of SNVs.
`minors`	number of tagSNVs for each IBD segment.
`mismatches`	number of minor allele tagSNV mismatches for individuals belonging to the IBD segment.
`mismatchImplanted`	percentage of individuals of an IBD segment that have mismatches.
`overlap`	minimal overlap of the founder interval between individuals belonging to a specific IBD segment (the interval may be broken at the ends).
`noOverwrite`	`noOverwrite=TRUE` ensures that an IBD segment is not superimposed by another IBD segment.

Data simulations focuses on rare variants but common variants are possible, too. Linkage disequilibrium and haplotype blocks are not simulated except by implanting IBD segments.

Simulated data is written to files. For BEAGLE the data is written to "...beagle.txt". For PLINK the data is written to "...plink.ped", "...plink.map", and "...plink.fam". For the MCMC method the data is written to "...mcmc.genotype", "...mcmc.posmaf", and "...mcmc.initz". For RELATE the data is written to "...relate.geno", "...relate.pos", and "...relate.chr". For fabia the data is written to "...fabia_individuals.txt", "...fabia_annot.txt" "...fabia_mat.txt".

Information on parameters for data simulation is written to "...Parameters.txt" while information on implanted IBD segments is written to "...Impl.txt".

Most information is also written in R binary ".Rda" files.

Implementation in R.

Generates simulated genotyping data with IBD segments

Sepp Hochreiter

S. Hochreiter et al., ‘FABIA: Factor Analysis for Bicluster Acquisition’, Bioinformatics 26(12):1520-1527, 2010.

IBDsegment-class, IBDsegmentList-class, analyzeIBDsegments, compareIBDsegmentLists, extractIBDsegments, findDenseRegions, hapFabia, hapFabiaVersion, hapRes, chr1ASW1000G, IBDsegmentList2excel, identifyDuplicates, iterateIntervals, makePipelineFile, matrixPlot, mergeIBDsegmentLists, mergedIBDsegmentList, plotIBDsegment, res, setAnnotation, setStatistics, sim, simu, simulateIBDsegmentsFabia, simulateIBDsegments, split_sparse_matrix, toolsFactorizationClass, vcftoFABIA

## Not run: 
old_dir <- getwd()
setwd(tempdir())

simulateIBDsegments(minruns=1,maxruns=1,snvs=1000,individualsN=10,avDistSnvs=100,avDistMinor=15,noImplanted=1,implanted=10,length=100,minors=10,mismatches=0,mismatchImplanted=0.5,overlap=50,noOverwrite=FALSE) 

setwd(old_dir)


## End(Not run)