readGeno: Read Genotype Data Using GenABEL.

Description Usage Arguments Details Value Author(s)

Description

readGeno reads genotype data stored in ped/map files, adjusts format for GenABEL package and processes it for further analysis.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
readGeno(
  genoFilename,
  mapFilename,
  covarFilename,
  existingRawFile = NULL,
  covarsampleID = "IID",
  sexvar = "Gender",
  projectfolder = "GT",
  projectname = NULL,
  organism = "human",
  Gonosomes2char = T,
  removeNullPositions = T,
  updateSNPpos = NULL,
  snpmart = useMart("ENSEMBL_MART_SNP", host = "feb2014.archive.ensembl.org", dataset =
    "hsapiens_snp"),
  GTarrayDescriptionFile = NULL,
  GTarrayDescription.lines2skip.start = "\\[Assay\\]",
  GTarrayDescription.lines2skip.end = "\\[Controls\\]",
  GTarrayDescription.colname.identifier = "Name",
  GTarrayDescription.colname.coding = "SNP",
  GTarrayDescription.colname.strand = "RefStrand",
  GTarrayDescription.colname.chromosome = "Chr",
  GTarrayDescription.colname.position = "MapInfo"
)

Arguments

genoFilename

Character with path to ped file.

mapFilename

Character with path to map file.

covarFilename

Character with path to covariates file (gender information is needed).

existingRawFile

Character with path to an optionally already existing GenABEL Raw which will preferably loaded and supersedes any other files. Omitted if NULL.

covarsampleID

Character with column name of sample IDs in covar file.

sexvar

Character with column name indicating gender in covar file.

projectfolder

Character containing path to output folder (will be generated if not existing).

projectname

Character used as suffix for output files.

organism

Character with name of organism (e.g. "human").

Gonosomes2char

Boolean. If TRUE, chromosome names 23, 24, 25, 26 will be converted to X, Y, XY, MT (for human only).

removeNullPositions

Boolean. If TRUE, SNPs with Chr=0 or Pos=0 or NA will be removed from generated gwaa object

updateSNPpos

Character with value "biomaRt" or "descriptionfile". For "biomaRt", chromosome and SNP positions in map-file will be updated by biomaRt. If "descriptionfile", they will be updated by the supplied GTarrayDescriptionFile. If NULL chromosome and SNP positions are not updated.

snpmart

biomaRt object to be used for updating SNP positions (or NULL).

GTarrayDescriptionFile

Optional character with path to Illumina array description file. If given, strand and allele coding data will be added to map file.

GTarrayDescription.lines2skip.start

Numeric with number of rows to skip when loading annotationFile or regular expression for character string to identify corresponding row number to be skipped, e.g. [Assay] in Illumina annotation files.

GTarrayDescription.lines2skip.end

Numeric with number of rows to read when loading annotationFile or regular expression for character string to identify corresponding row number to be read, e.g. [Controls] in Illumina annotation files as start of annotation of control probes. All rows from that number on (incl. GTarrayDescription.lines2skip.end) are skipped.

GTarrayDescription.colname.identifier

Character with colnames for SNP identifier in Array description file.

GTarrayDescription.colname.coding

Character with colnames for allele coding information.

GTarrayDescription.colname.strand

Character with colnames for strand information.

GTarrayDescription.colname.chromosome

Character with colnames for chromosome name.

GTarrayDescription.colname.position

Character with colnames for pb position.

Details

If a GenABEL raw data file given in existingRawFile already exists, this file is directly loaded into a gwaa object. Otherwise this file has to be created first from the supplied ped and map file. For this, the map file in mapFilename is loaded, the linkage column (column 3) removed if present and a headerline is added. If an Array description file is supplied in GTarrayDescriptionFile, an extended map file is generated for GenABEL including strand and allele coding information. Otherwise strand information is set to unknown. If indicated in updateSNPpos SNP chromosomal postions can be updated according either to biomaRt (if a biomaRt object is given in snpmart) or to GTarrayDescriptionFile. Mind that GenABEL does not allow strings of characters as alleles in the corresponding ped-file (as for Indels), but just 1-character-alleles or "-".

After loading the GenABEL raw file, phenotype information is added from the supplied covariates file covarFilename. I.e. no phenotype or gender data is considered from initial ped-file. No white space allowed in phenotype entries. No missing data allowed in gender column given in sexvar. All samples from geno file must have entries in pheno-file. For 1/2-coded binary phenotypes additional 0/1-coded phenotypes are generated (stored with suffix "01"). If gender is 1/2 (1=male, 2=female) coded in sexvar, a new variable sex is generated 0/1 coded (1=male, 0=female) as required by GenABEL. Optionally, numeric gonosome names can be converted to character names (e.g. "X", "Y"). If removeNullPositions is set TRUE, variants without valid coordinates are removed from the generated gwaa dataset.

Value

GenABEL gwaa object

Author(s)

Frank Ruehle


frankRuehle/systemsbio documentation built on Sept. 14, 2020, 1:18 a.m.