Home

/

GitHub

/

statpng/sp.gwas

/

import.hapmap: A function to import the hapmap formatted SNP data and the...

import.hapmap: A function to import the hapmap formatted SNP data and the...
In statpng/sp.gwas: An integrated tool for an analysis of high-dimensional genomic data in Genome-Wide Association Study

Description Usage Arguments Details Value Author(s) Examples

View source: R/import.hapmap.R

Input: Hapmap-formatted SNP data, phenotype data

Output: Matched data files (genotype, numerical, SNP information, QC information, and phenotype) with QC and/or imputation.

import.hapmap(
  genotype = NULL,
  phenotype = NULL,
  input.type = c("object", "path"),
  save.path,
  y.col = NULL,
  y.id.col = 2,
  family = "gaussian",
  normalization = TRUE,
  remove.missingY = TRUE,
  imputation = FALSE,
  impute.type = c("distribution", "mode"),
  QC = TRUE,
  callrate.range = c(0, 1),
  maf.range = c(0, 1),
  HWE.range = c(0, 1),
  heterozygosity.range = c(0, 1)
)

`genotype`	Either R object or file path can be considered. A genotype data is not a data.frame but a matrix with dimension `p` by `(n+11)`. It is formatted by hapmap which has (rs, allele, chr, pos) in the first four(1-4) columns, (strand, assembly, center, protLSID, assayLSID, panel, Qcode) in the following seven(5-11) columns. If NULL, user can choose a path in interactive use.
`phenotype`	Either R object or file path can be considered. A phenotype data is an `n` by `p` matrix. Since the first some columns can display attributes of the phenotypes, you should enter the arguments, y.col and y.id.col, which represent the columns of phenotypes to be analyzed and the column of sample ID. If NULL, user can choose a path in interactive use.
`input.type`	Default is "object". If `input.type` is "object", obejects of genotype/phenotype will be entered, and if "path", paths of genotype/phenotype will be enterd. If you want to use an object, you have to make sure that the class of each column of genotype data is equal to "character".
`save.path`	A save.path which has all output files. If there exists save.path, sp.gwas will check if there is an output file. Note that if there is an output RData file in "save.path", sp.gwas will just load the output files(.RData) in there, thereby not providing the results for new "genotype" and "phenotype".
`y.col`	The columns of phenotypes. At most 4 phenotypes can be considered, because the plot of them will be fine. Default is 2.
`y.id.col`	The column of sample ID in the phenotype data file. Default is 1.
`family`	A family of response variable(phenotype). It is "gaussian" for continuous response variable, "binomial" for binary, "poisson" for count, etc. Now you can use only the same family for the multi phenotypes. For more details, see the function(`stats::glm`). Default is "gaussian".
`normalization`	If TRUE. phenotypes are converted to be normal-shape using box-cox transformation when all phenotypes are positive.
`remove.missingY`	If TRUE, the samples with missing values in phenotype data are removed. Accordingly, the corresponding genotype samples are also filtered out. Default is TRUE.
`imputation`	TRUE or FALSE for whether imputation will be conducted.
`impute.type`	Two imputation methods are supported for (only) imputation=TRUE. Default is "distribution" which impute a genotype from allele distribution. The other is "mode" which indicates an imputation from the most frequent genotype.
`QC`	TRUE or FALSE for whether QC for SNPs will be conducted.
`callrate.range`	A numeric vector indicating the range of non-missing proportion. Default is c(0, 1).
`maf.range`	A numeric vector indicating the range of minor allele frequency (MAF) to be used. Default is c(0, 1).
`HWE.range`	A numeric vector indicating the range of pvalue by Hardy-Weinberg Equillibrium to be used. Default is c(0, 1).
`heterozygosity.range`	A numeric vector indicating the range of heterozygosity values to be used, because, in some cases, heterozygosity higher than expected indicates the low quality variants or sample contamination. Default is c(0, 1).

Hardy-Weinberg Equillibrium test was derived from "genetics" package. In imputation process, we first calculate the empirical allele frequencies. If we use a beta distribution as a prior in order to estimate the posterior distribution of allele frequency, then the posterior distribution of allele frequecy is also beta distribution. Accordingly, we impute the missing values with samples from the posterior distribution.

A folder containing a genomic data set in which the samples of genotype and phenotype data are matched, and that quality control steps can be conducted for genotype data

Kipoong Kim <kkp7700@gmail.com>

genotype <- sp.gwas::genotype # load("genotype.rda")
phenotype <- sp.gwas::phenotype # load("phenotype.rda")

# object
import.hapmap(genotype = genotype, 
              phenotype = phenotype, 
              input.type = c("object", "path")[1], 
              imputation = FALSE, 
              
              # if TRUE, the following QC steps (callrate, maf, HWE, heterozygosity) are conducted.
              QC = TRUE,  
              
              callrate.range = c(0.95, 1),
              maf.range = c(1e-3, 1),
              HWE.range = c(0, 1),
              heterozygosity.range = c(0, 1),
              
              # if TRUE, the samples with any missing phenotypes are filtered out in all data.
              remove.missingY = TRUE,
              
              save.path = "./EXAMPLE_obj",
              y.id.col = 1, 
              y.col = 2:4, 
              
              #if family is not "gaussian", i.e. not continuous variable, normalization should be FALSE
              normalization = FALSE,
              family="gaussian")



# path

write.table( x = sp.gwas::genotype, file = "./genotype.csv", row.names = FALSE, col.names = FALSE, sep=",")
write.table( x = sp.gwas::phenotype, file = "./phenotype.csv", row.names = FALSE, sep="," )

import.hapmap(genotype = "./genotype.csv", 
              phenotype = "./phenotype.csv", 
              input.type = c("object", "path")[2], 
              QC = TRUE,  # if TRUE, the following QC steps (callrate, maf, HWE, heterozygosity) are conducted.
              callrate.range = c(0.95, 1),
              maf.range = c(1e-3, 1),
              HWE.range = c(0, 1),
              heterozygosity.range = c(0, 0.5),
              remove.missingY = TRUE,   # if TRUE, the samples with any missing phenotypes are filtered out in all data.
              save.path = "./EXAMPLE_path",
              y.id.col = 1, 
              y.col = 2:4, 
              normalization = FALSE, #if family is not "gaussian", i.e. not continuous variable, normalization should be FALSE
              family="gaussian")

statpng/sp.gwas documentation built on Dec. 17, 2020, 5:55 a.m.

statpng/sp.gwas index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

statpng/sp.gwas
An integrated tool for an analysis of high-dimensional genomic data in Genome-Wide Association Study

import.hapmap: A function to import the hapmap formatted SNP data and the...
In statpng/sp.gwas: An integrated tool for an analysis of high-dimensional genomic data in Genome-Wide Association Study

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Related to import.hapmap in statpng/sp.gwas...

R Package Documentation

Browse R Packages

We want your feedback!

statpng/sp.gwas An integrated tool for an analysis of high-dimensional genomic data in Genome-Wide Association Study

import.hapmap: A function to import the hapmap formatted SNP data and the... In statpng/sp.gwas: An integrated tool for an analysis of high-dimensional genomic data in Genome-Wide Association Study

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Related to import.hapmap in statpng/sp.gwas...

R Package Documentation

Browse R Packages

We want your feedback!

statpng/sp.gwas
An integrated tool for an analysis of high-dimensional genomic data in Genome-Wide Association Study

import.hapmap: A function to import the hapmap formatted SNP data and the...
In statpng/sp.gwas: An integrated tool for an analysis of high-dimensional genomic data in Genome-Wide Association Study