pre2.remove.genos: Remove genos with many empty values

Description Usage Arguments Details Author(s) See Also Examples

Description

Remove columns (genos) that have too many missing values. All genos that have more than perc.snp values missing in both case.ped AND control.ped files will be removed.

Usage

1
2
3
pre2.remove.genos(file.dat, case.ped, control.ped, dir.dat, dir.out, 
dir.warning = dir.out, perc.snp = 10, perc.patient = 20, empty = "0/0", 
num.nonsnp.col = 5)

Arguments

file.dat

The name of data file as required for MaCH1. The file should be of the format:

         M SNP1
         M SNP2

        - Space separated
        - No header
        - Column 1: consists of "M"
        - Column 2: character SNP names
case.ped

The name of pedegree data file that contains CASEs in MaCH input format.

control.ped

The name of pedegree data file that contains CONTROLs in MaCH input format.

dir.dat

The directory name where file.dat and file.ped can be found.

dir.out

The directory name to which output files should be saved.

dir.warning

The directory name to which warnings about patients with too many missing SNPs should go. Defaults to the same place as dir.out.

perc.snp

The percentage (0-100 percent) of maximum empty values allowed for each geno (column). All genos that have more empty values than this threshold will be removed.

perc.patient

The percentage (0-100 percent) of empty values allowed for each patient (row). Names of all patients who end up having more empty values than this threshold will be recorded in the warnings file.

empty

The representation of a missing SNP value in the file ("0 0", "0/0", "1/1", "N N", etc).

num.nonsnp.col

The number of leading columns in the .ped files that do not contain SNP values. The first columns of the file represent non-SNP values (like patient ID, gender, etc). For MaCH1 input format, the num.nonsnp.col=5, for PLINK it is 6 (due to extra disease status column).

Details

Remove columns (genos) that have too many missing values. All genos that have more than perc.snp values missing in both case.ped AND control.ped files will be removed.

All patients that have more than perc.patient values missing will have their IDs written into "warning.<case.ped>.txt" files. Output will be two clean versions of case.ped and control.ped files in dir.out directory, and optionally the warning files in dir.warning directory.

The following files will be saved after the program is run:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
 - <file.dat>.removed.dat - the .dat file containing only the SNPs that were not
       removed, will be placed in dir.out directory
 - <case.ped>.removed.ped - the CASE .ped file without columns that contain too 
       many missing values based on the thresholds perc.snp; in dir.out directory
 - <control.ped>.removed.ped - the CONTROL .ped file without columns that contain
       too many missing values based on the thresholds perc.snp; 
       in dir.out directory

 - warning.<case.ped>.txt - file containing warning messages about patients that 
       have too many SNPs missing (based on perc.patients) in CASE.ped file, 
       after the removal of bad SNPs.
 - warning.<control.ped>.txt - similar to warning.<case.ped>.txt, only for 
       CONTROL file.

Author(s)

Olia Vesselova

See Also

pre1.plink2mach, pre1.plink2mach.batch, pre2.remove.genos.batch, pre3.call.mach, pre3.call.mach.batch

Examples

1
print("See the demo 'gendemo'.")

Example output

[1] "See the demo 'gendemo'."

genMOSSplus documentation built on May 1, 2019, 10:31 p.m.