GPART: Partitioning genodata based on the result obtained by using...

Description Usage Arguments Value Author(s) See Also Examples

Description

GPART partition the given genodata using the result obtained by Big-LD and gene region information. The algorithm partition the whole sequence into sub sequences of which size do not exceed the given threshold.

Usage

1
2
3
4
5
6
7
GPART(geno=NULL, SNPinfo=NULL, geneinfo=NULL, genofile=NULL,
SNPinfofile=NULL, geneinfofile=NULL, geneDB = c("ensembl","ucsc"),
assembly = c("GRCh38", "GRCh37"), geneid = "hgnc_symbol",ensbversion = NULL,
chrN=NULL, startbp=-Inf, endbp=Inf, BigLDresult=NULL, minsize=4, maxsize=50,
LD=c("r2", "Dprime"), CLQcut=0.5, CLQmode=c("density", "maximal"), MAFcut = 0.05,
GPARTmode=c("geneBased", "LDblockBased"),
Blockbasedmode=c("onlyBlocks", "useGeneRegions"))

Arguments

geno

Data frame or matrix of additive genotype data, each column is additive genotype of each SNP.

SNPinfo

Data frame or matrix of SNPs information. 1st column is rsID and 2nd column is bp position.

geneinfo

Data frame or matrix of Gene info data. (1st col : Genename, 2nd col : chromosome, 3rd col : start bp, 4th col : end bp)

genofile

Character constant; Genotype data file name (supporting format: .txt, .ped, .raw, .traw, .vcf).

SNPinfofile

Character constant; SNPinfo data file name (supporting format: .txt, .map).

geneinfofile

A Character constant; file containing the gene information (1st col : Genename, 2nd col : chromosome, 3rd col : start bp, 4th col : end bp)

geneDB

A Character constant; database type for gene information. Set "ensembl" to get gene info from "Ensembl", or set "ucsc" to get gene info from "UCSC genome browser" (See package "biomaRt" or package "homo.sapiens"/"TxDb.Hsapiens.UCSC.hg38.knownGene"/ "TxDb.Hsapiens.UCSC.hg19.knownGene" for details.)

assembly

A character constant; set "GRCh37" for GRCh37, or set "GRCh38" for GRCh38

geneid

A character constant; When you use the gene information by geneDB. specity the symbol for gene name to use. default is "hgnc_symbol". (eg. 'ensembl_gene_id' for geneDB = "ensembl", "ENTREZID"/"ENSEMBL"/"REFSEQ"/"TXNAME" for geneDB="ucsc". See package 'biomaRt' or package 'Homo.sapiens' for details)

ensbversion

a integer constant; you can set the release version of ensembl when you use the gene information by using geneDB='emsembl' and assembly='GRCh38'

chrN

Numeric(or Character) constant (or vector); chromosome number to use. If NULL(default), we use all chromosome.

startbp

Numeric constant; starting bp position of the chrN. Default -Inf.

endbp

Numeric constant; last bp position of the chrN. Default Inf.

BigLDresult

Data frame; a result obtained by BigLD function. If NULL(default), the GPART function first excute BigLD function to obtain LD blocks estimation result.

minsize

Numeric constant; the lower bound of number of SNPs in a partition.

maxsize

Numeric constant; the upper bound of number of SNPs in a partition.

LD

Character constant; LD measure to use, r2 or Dprime.

CLQcut

Numeric constant; threshold for the correlation value |r|, between 0 to 1.

CLQmode

Character constant; the way to give priority among detected cliques. if CLQmode = "density" then the algorithm gives priority to the clique of largest value of (Number of SNPs)/(range of clique), else if CLQmode = "maximal", then the algorithm gives priority to the largest clique. The default is "density".

MAFcut

Numeric constant; the MAF threshold. Default 0.05.

GPARTmode

Character constant; GPART algorithm methods to use, "geneBased" or "LDblockBased". Default is ??geneBased??

Blockbasedmode

Character constant; When you set GPARTmode = "LDblockBased", specify LDblock based method as "onlyBlocks"("LDblock based only" algorithm) or "useGeneRegions"(LDblock based and also use gene info algorithm).

Value

GPART returns data frame which contains 9 information of each partition (chromosome, index number of the first SNP and last SNP, rsID of the first SNP and last SNP, basepair position of the first SNP and last SNP, blocksize, Name of a block)

Author(s)

Sun Ah Kim <sunny03@snu.ac.kr>, Yun Joo Yoo <yyoo@snu.ac.kr>

See Also

BigLD

Examples

1
2
3
4
data(geno)
data(SNPinfo)
data(geneinfo)
GPART(geno=geno, SNPinfo=SNPinfo, geneinfo=geneinfo, startbp = 16058400, endbp = 16076500)

sunnyeesl/gpart documentation built on May 9, 2019, 7:40 a.m.