runFimpute | R Documentation |
Impute SNP genotypes via FImpute (Sargolzaei et al, 2014).
runFimpute(
X = NULL,
chips = NULL,
snp.coords = NULL,
vcf.file = NULL,
yieldSize = 10000,
ped = NULL,
exe = Sys.which("FImpute"),
work.dir = getwd(),
task.id = "fimpute",
params.fimpute = NULL,
clean = "none",
verbose = 1
)
X |
matrix of bi-allelic SNP genotypes encoded in number of copies of the 2nd allele, i.e. as allele doses in 0,1,2, with genotypes in rows and SNPs in columns; missing values should be encoded as NA; the maximum length of genotypes identifiers is 30 characters; if not NULL, will be used in priority even if |
chips |
if several microarrays were used, provide a vector with chip numbers which names are genotype identifiers (same as in |
snp.coords |
data.frame with SNP identifiers as row names, and two compulsory columns in that order, chromosome identifiers and coordinate/position identifiers; the maximum length of SNP identifiers is 50 characters; chromosome identifiers should be numeric; other optional column(s) contain the SNP order on the microarray(s) (maximum 10); compulsory if |
vcf.file |
path to the VCF file (if the bgzip index doesn't exist in the same directory, it will be created); used only if |
yieldSize |
number of records to yield each time the file is read from (see |
ped |
data frame of pedigree with four columns in that order, genotype identifiers (same as in |
exe |
path to the executable of FImpute |
work.dir |
directory in which the input and output files will be saved |
task.id |
identifier of the task (used in temporary and output file names) |
params.fimpute |
list of additional parameters to pass to FImpute |
clean |
remove files: none, some (temporary only), all (temporary and results) |
verbose |
verbosity level (0/1) |
list
Timothee Flutre
writeInputsFimpute
, readOutputsFimpute
, runFastphase
## Not run: ## simulate haplotypes and genotypes in a single population
set.seed(1859)
nb.inds <- 5*10^2
nb.chrs <- 1
Ne <- 10^4
chrom.len <- 1*10^6
mu <- 10^(-8)
c.rec <- 10^(-7)
genomes <- simulCoalescent(nb.inds=nb.inds,
nb.reps=nb.chrs,
pop.mut.rate=4 * Ne * mu * chrom.len,
pop.recomb.rate=4 * Ne * c.rec * chrom.len,
chrom.len=chrom.len,
get.alleles=TRUE)
nb.snps <- nrow(genomes$snp.coords)
plotHaplosMatrix(genomes$haplos[[1]]) # quick view of the amount of LD
## discard some genotypes according to a "microarray" design:
## some inds with high density of genotyped SNPs, and the others with
## low density of SNPs, these being on both microarrays
ind.names <- rownames(genomes$genos)
inds.high <- sample(x=ind.names, size=floor(0.4 * nb.inds))
inds.low <- setdiff(ind.names, inds.high)
snp.names <- colnames(genomes$genos)
mafs <- estimSnpMaf(X=genomes$genos)
length(idx <- which(mafs >= 0.05))
snps.high <- names(idx)
snps.high.only <- sample(x=snps.high, size=floor(0.7*length(snps.high)))
dim(X <- genomes$genos[ind.names, snps.high])
X.na <- X
X.na[inds.low, snps.high.only] <- NA
sum(is.na(X.na)) / length(X.na)
plotGridMissGenos(X=X.na)
## perform imputation
snp.coords <- genomes$snp.coords[colnames(X.na),]
snp.coords$chr <- as.numeric(sub("chr", "", snp.coords$chr))
out.FI <- runFimpute(X=X.na, snp.coords=snp.coords, clean="all")
X.imp.FI <- out.FI$genos.imp
## assess imputation accuracy
genomes$haplos[[1]][1:4, 1:7]
genomes$genos[1:2, 1:7]
X.na[1:2, 1:6]
X.imp.FI[1:2, 1:6]
100 * sum(X.imp.FI != X) / sum(is.na(X.na))
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.