Write a new netCDF or GDS file, setting certain SNPs to missing

Description

setMissingGenotypes copies an existing GDS or netCDF genotype file to a new one, setting SNPs in specified regions to missing.

Usage

1
2
3
setMissingGenotypes(parent.file, new.file, regions, file.type=c("gds", "ncdf"),
                    sample.include=NULL, compress="ZIP_RA", 
                    copy.attributes=TRUE, verbose=TRUE)

Arguments

parent.file

Name of the parent file

new.file

Name of the new file

regions

Data.frame of chromosome regions with columns "scanID", "chromosome", "left.base", "right.base", "whole.chrom".

file.type

The type of parent.file and new.file ("gds" or "ncdf")

sample.include

Vector of sampleIDs to include in new.file

compress

The compression level for variables in a GDS file (see add.gdsn for options).

copy.attributes

Logical value specifying whether to copy chromosome attributes to the new file.

verbose

Logical value specifying whether to show progress information.

Details

setMissingGenotypes removes chromosome regions by setting SNPs that fall within the anomaly regions to NA (i.e., the missing value in the netCDF/GDS file). Optionally, entire samples may be excluded from the netCDF/GDS file as well: if the sample.include argument is given, only the scanIDs in this vector will be written to the new file, so the sample dimension will be length(sample.include).

For regions with whole.chrom=TRUE, the entire chromosome will be set to NA for that sample. For other regions, only the region between left.base and right.base will be set to NA.

Author(s)

Stephanie Gogarten

See Also

gdsSubset, anomSegStats for chromosome anomaly regions

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
gdsfile <- system.file("extdata", "illumina_geno.gds", package="GWASdata")
gds <- GdsGenotypeReader(gdsfile)
sample.sel <- getScanID(gds, index=1:10)
close(gds)

regions <- data.frame("scanID"=sample.sel[1:3], "chromosome"=c(21,22,23),
  "left.base"=c(14000000, 30000000, NA), "right.base"=c(28000000, 450000000, NA),
  whole.chrom=c(FALSE, FALSE, TRUE))

newgds <- tempfile()
setMissingGenotypes(gdsfile, newgds, regions, file.type="gds", sample.include=sample.sel)
file.remove(newgds)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.