Description Usage Arguments Details Value Author(s) Examples
View source: R/import.hapmap.R
Input: Hapmap-formatted SNP data, phenotype data
Output: Matched data files (genotype, numerical, SNP information, QC information, and phenotype) with QC and/or imputation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | import.hapmap(
genotype = NULL,
phenotype = NULL,
input.type = c("object", "path"),
save.path,
y.col = NULL,
y.id.col = 2,
family = "gaussian",
normalization = TRUE,
remove.missingY = TRUE,
imputation = FALSE,
impute.type = c("distribution", "mode"),
QC = TRUE,
callrate.range = c(0, 1),
maf.range = c(0, 1),
HWE.range = c(0, 1),
heterozygosity.range = c(0, 1)
)
|
genotype |
Either R object or file path can be considered. A genotype data is not a data.frame but a matrix with dimension |
phenotype |
Either R object or file path can be considered. A phenotype data is an |
input.type |
Default is "object". If |
save.path |
A save.path which has all output files. If there exists save.path, sp.gwas will check if there is an output file. Note that if there is an output RData file in "save.path", sp.gwas will just load the output files(.RData) in there, thereby not providing the results for new "genotype" and "phenotype". |
y.col |
The columns of phenotypes. At most 4 phenotypes can be considered, because the plot of them will be fine. Default is 2. |
y.id.col |
The column of sample ID in the phenotype data file. Default is 1. |
family |
A family of response variable(phenotype). It is "gaussian" for continuous response variable, "binomial" for binary, "poisson" for count, etc. Now you can use only the same family for the multi phenotypes. For more details, see the function( |
normalization |
If TRUE. phenotypes are converted to be normal-shape using box-cox transformation when all phenotypes are positive. |
remove.missingY |
If TRUE, the samples with missing values in phenotype data are removed. Accordingly, the corresponding genotype samples are also filtered out. Default is TRUE. |
imputation |
TRUE or FALSE for whether imputation will be conducted. |
impute.type |
Two imputation methods are supported for (only) imputation=TRUE. Default is "distribution" which impute a genotype from allele distribution. The other is "mode" which indicates an imputation from the most frequent genotype. |
QC |
TRUE or FALSE for whether QC for SNPs will be conducted. |
callrate.range |
A numeric vector indicating the range of non-missing proportion. Default is c(0, 1). |
maf.range |
A numeric vector indicating the range of minor allele frequency (MAF) to be used. Default is c(0, 1). |
HWE.range |
A numeric vector indicating the range of pvalue by Hardy-Weinberg Equillibrium to be used. Default is c(0, 1). |
heterozygosity.range |
A numeric vector indicating the range of heterozygosity values to be used, because, in some cases, heterozygosity higher than expected indicates the low quality variants or sample contamination. Default is c(0, 1). |
Hardy-Weinberg Equillibrium test was derived from "genetics" package. In imputation process, we first calculate the empirical allele frequencies. If we use a beta distribution as a prior in order to estimate the posterior distribution of allele frequency, then the posterior distribution of allele frequecy is also beta distribution. Accordingly, we impute the missing values with samples from the posterior distribution.
A folder containing a genomic data set in which the samples of genotype and phenotype data are matched, and that quality control steps can be conducted for genotype data
Kipoong Kim <kkp7700@gmail.com>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | genotype <- sp.gwas::genotype # load("genotype.rda")
phenotype <- sp.gwas::phenotype # load("phenotype.rda")
# object
import.hapmap(genotype = genotype,
phenotype = phenotype,
input.type = c("object", "path")[1],
imputation = FALSE,
# if TRUE, the following QC steps (callrate, maf, HWE, heterozygosity) are conducted.
QC = TRUE,
callrate.range = c(0.95, 1),
maf.range = c(1e-3, 1),
HWE.range = c(0, 1),
heterozygosity.range = c(0, 1),
# if TRUE, the samples with any missing phenotypes are filtered out in all data.
remove.missingY = TRUE,
save.path = "./EXAMPLE_obj",
y.id.col = 1,
y.col = 2:4,
#if family is not "gaussian", i.e. not continuous variable, normalization should be FALSE
normalization = FALSE,
family="gaussian")
# path
write.table( x = sp.gwas::genotype, file = "./genotype.csv", row.names = FALSE, col.names = FALSE, sep=",")
write.table( x = sp.gwas::phenotype, file = "./phenotype.csv", row.names = FALSE, sep="," )
import.hapmap(genotype = "./genotype.csv",
phenotype = "./phenotype.csv",
input.type = c("object", "path")[2],
QC = TRUE, # if TRUE, the following QC steps (callrate, maf, HWE, heterozygosity) are conducted.
callrate.range = c(0.95, 1),
maf.range = c(1e-3, 1),
HWE.range = c(0, 1),
heterozygosity.range = c(0, 0.5),
remove.missingY = TRUE, # if TRUE, the samples with any missing phenotypes are filtered out in all data.
save.path = "./EXAMPLE_path",
y.id.col = 1,
y.col = 2:4,
normalization = FALSE, #if family is not "gaussian", i.e. not continuous variable, normalization should be FALSE
family="gaussian")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.