pre2.remove.genos.batch: Remove genos with many empty values for all files

Description Usage Arguments Details Author(s) See Also Examples

Description

For all specified files, remove columns (genos) that have too many missing values. This program will automatically match CASEs and CONTROLs and their corresponding .dat files based on the specifications of prefixes, keys, and endings.

Usage

1
2
3
4
5
pre2.remove.genos.batch(dir.dat, dir.ped = dir.dat, dir.out, 
dir.warning = dir.out, perc.snp = 10, perc.patient = 20, empty = "0/0", 
num.nonsnp.col = 5, prefix.dat, prefix.case, prefix.control, key.dat = "", 
key.case = "CASE", key.control = "CONTROL", ending.dat = ".dat", 
ending.case = ".ped", ending.control = ".ped")

Arguments

dir.dat

The directory name where all .dat files can be found.

dir.ped

The directory name where all .ped CASE and CONTROL files can be found. Defaults to same place as dir.dat

dir.out

The directory name to which output files should be saved.

dir.warning

The directory name to which warnings about patients with too many missing SNPs should go. Defaults to the same place as dir.out.

perc.snp

The percentage (0-100 percent) of maximum empty values allowed for each geno (column). All genos that have more empty values than this threshold will be removed.

perc.patient

The percentage (0-100 percent) of empty values allowed for each patient (row). Names of all patients who end up having more empty values than this threshold will be recorded in the warnings file.

empty

The representation of a missing SNP value in the file ("0 0", "0/0", "1/1", "N N", etc).

num.nonsnp.col

The number of leading columns in the .ped files that do not contain SNP values. The first columns of the file represent non-SNP values (like patient ID, gender, etc). For MaCH1 input format, the num.nonsnp.col=5, for PLINK it is 6 (due to extra disease status column).

prefix.dat

The beginning of the file name for the .dat file (up until chrom number).

prefix.case

The beginning of the file name for the CASE pedegree file (up until chrom number).

prefix.control

The beginning of the file name for the CONTROL pedegree file (up until chrom number).

key.dat

Any keyword in the name of the pedegree file that distinguishes it from other files.

key.case

Any keyword in the name of the CASE pedegree file that distinguishes it from other non-pedegree non-CASE files.

key.control

Any keyword in the name of the CONTROL pedegree file that distinguishes it from other non-pedegree non-CONTROL files.

ending.dat

The ending of the .dat filenames.

ending.case

The ending of the CASE pedegree filenames.

ending.control

The ending of the CONTROL pedegree filenames.

Details

Removes SNPs that contain more than perc.snp empty geno values, from all the corresponding CASE and CONTROL .ped and .dat files in directory dir.dat. If a .ped file for some chromosome is split into several files, these files will be concatenated into one file for that chromosome, in alphabetical order. Those chromosomes that have files that satisfy the (prefix, key, ending) selection criterion but do NOT have complete set of 3 files (CASE, CONTROL, and .dat), will NOT be processed.

Author(s)

Olia Vesselova

See Also

pre1.plink2mach, pre1.plink2mach.batch, pre2.remove.genos, pre3.call.mach, pre3.call.mach.batch

Examples

1
print("See the demo 'gendemo'.")

Example output

[1] "See the demo 'gendemo'."

genMOSSplus documentation built on May 1, 2019, 10:31 p.m.