Load phased genotype data

Share:

Description

This function loads phased genotype data and converts it to a native GHap.phase object.

Usage

1
  ghap.loadphase(samples.file, markers.file, phase.file, verbose = TRUE)

Arguments

samples.file

Individual information.

markers.file

Variant map information.

phase.file

Phased genotype matrix.

verbose

A logical value specfying whether log messages should be printed (default = TRUE).

Value

The returned GHap.phase object is a list with components:

chr

A character value indicating chromosome identity. The current version of the package can handle only one chromosome at time.

nsamples

An integer value for the sample size.

nmarkers

An integer value for the number of markers.

nsamples.in

An integer value for the number of active samples.

nmarkers.in

An integer value for the number of active markers.

pop

A character vector relating chromosome alleles to populations. This information is obtained from the first column of the sample file.

id

A character vector mapping chromosome alleles to samples. This information is obtained from the second column of the sample file.

id.in

A logical vector indicating active chromosome alleles. By default, all chromosomes are set to TRUE.

marker

A character vector containing marker names. This information is obtained from the second column of the marker map file.

marker.in

A logical vector indicating active markers. By default, all markers are set to TRUE.

bp

A numeric vector with marker positions. This information is obtained from the third column of the marker map file.

A0

A character vector with reference alleles. This information is obtained from the fourth column of the marker map file.

A1

A character vector with alternative alleles. This information is obtained from the fifth column of the marker map file.

phase

A big.matrix object containing the phased genotype matrix.

The supported format is composed of three files with suffix:

  • .samples: space-delimited file without header containing two columns: Population and ID. Please notice that the Population column serves solely for the purpose of grouping samples, so the user can define any arbitrary family/cluster/subgroup and use as a "population" tag.

  • .markers: space-delimited file without header containing five columns: Chromosome, Marker, Position (in bp), Reference Allele (A0) and Alternative Allele (A1). Markers should be on a single chromosome and sorted by position. Repeated positions are tolerated, but a warning message is given when the data is loaded.

  • .phase: space-delimited file without header containing the phased genotype matrix. The dimension of the matrix is expected to be m x 2n, where m is the number of markers and n is the number of individuals (i.e., two columns per individual, representing the two phased chromosome alleles). Alleles must be coded as 0 and 1. No missing values are allowed, since imputation is assumed to be part of the phasing procedure.

Author(s)

Yuri Tani Utsunomiya <ytutsunomiya@gmail.com>

Marco Milanesi <marco.milanesi.mm@gmail.com>

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# #### DO NOT RUN IF NOT NECESSARY ###
# 
# # Copy the example data in the current working directory
# ghap.makefile()
# 
# 
# ### RUN ###
# 
# # Load data
# phase <- ghap.loadphase("human.samples", "human.markers", "human.phase")