genDataRead: Reading the genetic data from a file
In Haplin: Analyzing Case-Parent Triad and/or Case-Control Data with SNP Haplotypes

genDataRead

R Documentation

Reading the genetic data from a file

Description

This function will read in data from PED or haplin formatted file.

Usage

genDataRead(
  file.in = stop("Filename must be given!", call. = FALSE),
  file.out = NULL,
  dir.out = ".",
  format = stop("Format parameter is required!"),
  header = FALSE,
  n.vars,
  cov.file.in,
  cov.header,
  map.file,
  map.header = FALSE,
  allele.sep = ";",
  na.strings = "NA",
  col.sep = "",
  overwrite = NULL
)

Arguments

`file.in`	The name of the main input file with genotype information.
`file.out`	The base for the output filename (by default, constructed from the input file name).
`dir.out`	The path to the directory where the output files will be saved.
`format`	Format of data (will influence how data is processed) - choose from: haplin - data already in one row per family, ped - data from .ped file, each row represents an individual. .
`header`	Whether the first line of the main input file contains column names; default: FALSE; NB: this is useful only for 'haplin'-formatted files!
`n.vars`	The number of columns with covariate data (if any) in the main file; NB: if the main file is in PED format, it is assumed that the first 6 columns contain the standard PED-covariates (i.e., family ID, ID of the child, father and mother, sex and case-control status), so in this case setting 'n.vars' is useful only if the PED file contains more than 6 covariate columns.
`cov.file.in`	Name of the file containing additional covariate data, if any. Caution: unless the 'cov.header' argument is used, it is assumed that the first line of this file contains the header (i.e., the column names of the additional data).
`cov.header`	The character vector containing the names of covariate columns (in the file with additional covariate data if given by the 'cov.file.in' argument; or in the main file, if it's a "haplin"-formatted file).
`map.file`	Filename (with path if the file is not in current directory) of the .map file holding the SNP names, if available (see Details).
`map.header`	Logical: does the map.file contain a header in the first row? Default: FALSE.
`allele.sep`	Character: separator between two alleles (default: ";").
`na.strings`	Character or NA: how the missing data is coded (default: "NA").
`col.sep`	Character: separator between the columns (i.e., markers; default: any whitespace character).
`overwrite`	Whether to overwrite the output files: if NULL (default), will prompt the user to give answer; set to TRUE, will automatically overwrite any existing files; and set to FALSE, will stop if the output files exist.

Details

The function reads in all the data in the file, creates ff objects to store the genetic information and data.frame to store covariate data (if any). These objects are saved in .RData and .ffData files, which can be later on easily uploaded to R (with genDataLoad) and re-used.

Value

A list object with three elements:

cov.data - a data.frame with covariate data (if available in the input file)
gen.data - a list with chunks of the genetic data; the data is divided column-wise, using 10,000 columns per chunk; each element of this list is a ff matrix
aux - a list with meta-data and important parameters.

Details

The .map file should contain at least two columns, where the second one contains SNP names. Any additional columns should be separated by a whitespace character, but will be ignored. The file should contain a header.

Usage note

When reading in a covariate file together with the genotype information, it is advised to include the header in the file, so that there is no doubt to the naming of the data columns.

Examples

  # The argument 'overwrite' is set to TRUE!
  examples.dir <- system.file( "extdata", package = "Haplin" )
  # ped format:
  example.file2 <- file.path( examples.dir, "exmpl_data.ped" )
  ped.data.read <- genDataRead( example.file2, file.out = "exmpl_ped_data", 
   dir.out = tempdir( check = TRUE ), format = "ped", overwrite = TRUE )
  ped.data.read
  # haplin format:
  example.file1 <- file.path( examples.dir, "HAPLIN.trialdata2.txt" )
  haplin.data.read <- genDataRead( file.in = example.file1,
   file.out = "exmpl_haplin_data", format = "haplin", allele.sep = "", n.vars = 2, 
   cov.header = c( "smoking", "sex" ), overwrite = TRUE,
   dir.out = tempdir( check = TRUE ) )
  haplin.data.read

Haplin documentation built on Sept. 11, 2024, 7:13 p.m.