loadplink: Load binary PLINK data

ghap.loadplinkR Documentation

Load binary PLINK data

Description

This function loads binary PLINK files (bed/bim/fam) and converts them into a native GHap.plink object.

Usage

  ghap.loadplink(input.file = NULL, bed.file = NULL,
                 bim.file = NULL, fam.file = NULL,
                 ncores = 1, verbose = TRUE)

Arguments

If all input files share the same prefix, the user can use the following shortcut option:

input.file

Prefix for input files.

For backward compatibility, the user can still point to input files separately:

bed.file

The binary genotype matrix (in SNP-major format).

bim.file

Variant map file.

fam.file

Pedigree (family) file.

To turn loading progress-tracking on or off, or engage multiple cores, please use:

ncores

A numerical value specfying the number of cores to use while loading the input files (default = 1).

verbose

A logical value specfying whether log messages should be printed (default = TRUE).

Value

The returned GHap.plink object is a list with components:

nsamples

An integer value for the sample size.

nmarkers

An integer value for the number of markers.

nsamples.in

An integer value for the number of active samples.

nmarkers.in

An integer value for the number of active markers.

pop

A character vector relating genotypes to populations. This information is obtained from the FID (1st) column in the fam file.

id

A character vector mapping genotypes to samples. This information is obtained from the IID (2nd) column in the fam file.

id.in

A logical vector indicating active chromosome alleles. By default, all chromosomes are set to TRUE.

sire

A character vector indicating sire names, as provided in the SID (3rd) column of the fam file.

dam

A character vector indicating dam names, as provided in the DID (4th) column of the fam file.

sex

A character vector indicating individual sex, as provided in the SEX (5th) column of the fam file. Codes are converted as follows: 0 = NA, 1 = Male and 2 = Female.

chr

A character vector indicating chromosome identity for each marker.

marker

A character vector containing marker names.

marker.in

A logical vector indicating active markers. By default, all markers are set to TRUE.

cm

A numeric vector with genetic positions for markers. This information is obtained from the third column of the bim file. If genetic positions are absent (coded as "0"), they are approximated from physical positions assuming 1 Mb ~ 1 cM.

bp

A numeric vector with physical positions for markers.

A0

A character vector with reference alleles. For convenience, this information is obtained from the 6th column of the bim file. If "–keep-allele-order" is not used while generating the PLINK binary file, A0 will correspond to the major allele.

A1

A character vector with alternative alleles. As for A0, if "–keep-allele-order" is not used A1 will correspond to the minor allele.

plink

A character value giving the pathway to the binary genotype matrix.

Author(s)

Yuri Tani Utsunomiya <ytutsunomiya@gmail.com>

Examples


# #### DO NOT RUN IF NOT NECESSARY ###
# 
# # Copy phase data in the current working directory
# exfiles <- ghap.makefile(dataset = "example",
#                          format = "plink",
#                          verbose = TRUE)
# file.copy(from = exfiles, to = "./")
# 
# ### RUN ###
# 
# # Load data using prefix
# plink <- ghap.loadplink("example")
# 
# # Load data using file names
# plink <- ghap.loadplink(bed.file = "example.bed",
#                         bim.file = "example.bim",
#                         fam.file = "example.fam")



GHap documentation built on July 2, 2022, 1:07 a.m.