simulateGenotypeMatrix: Simulate Genotype or Intensity Matrix & Load into GDS/NetCDF...

View source: R/simulateGenotypeMatrix.R

simulateGenotypeMatrixR Documentation

Simulate Genotype or Intensity Matrix & Load into GDS/NetCDF File

Description

These functions create a simulated genotype or intensity file for test and examples.

Usage

simulateGenotypeMatrix(n.snps=10, n.chromosomes=10,
                       n.samples=1000, filename,
                       file.type=c("gds", "ncdf"), silent=TRUE)

Arguments

n.snps

An integer corresponding to the number of SNPs per chromosome, the default value is 10. For this function, the number of SNPs is assumed to be the same for every chromosome.

n.chromosomes

An integer value describing the total number of chromosomes with default value 10.

n.samples

An integer representing the number of samples for our data. The default value is 1000 samples.

filename

A string that will be used as the name of the file. This is to be used later when opening and retrieving data generated from this function.

file.type

The type of file to create ("gds" or "ncdf")

silent

Logical value. If FALSE, the function returns a table of genotype counts generated. The default is TRUE; no data will be returned in this case.

Details

The resulting netCDF file will have the following characteristics:

Dimensions:

'snp': n.snps*n.chromosomes length

'sample': n.samples length

Variables:

'sampleID': sample dimension, values 1-n.samples

'position': snp dimension, values [1,2,...,n.chromosomes] n.snps times

'chromosome': snp dimension, values [1,1,...]n.snps times, [2,2,...]n.snps times, ..., [n.chromosomes,n.chromosomes,...]n.snps times

'genotype': 2-dimensional snp x sample, values 0, 1, 2 chosen from allele frequencies that were generated from a uniform distribution on (0,1). The missing rate is 0.05 (constant across all SNPs) and is denoted by -1.

OR

'quality': 2-dimensional snp x sample, values between 0 and 1 chosen randomly from a uniform distribution. There is one quality value per snp, so this value is constant across all samples.

'X': 2-dimensional snp x sample, value of X intensity taken from a normal distribution. The mean of the distribution for each SNP is based upon the sample genotype. Mean is 0,2 if sample is homozygous, 1 if heterozygous.

'Y': 2-dimensional snp x sample, value of Y intensity also chosen from a normal distribution, where the mean is chosen according to the mean of X so that sum of means = 2.

Value

simulateGenotypeMatrix returns a table of genotype calls if the silent variable is set to FALSE, where 2 indicates an AA genotype, 1 is AB, 0 is BB and -1 corresponds to a missing genotype call.

simulateIntensityMatrix returns a list if the silent variable is set to FALSE, which includes:

het

Heterozygosity table

nmiss

Number of missing values

A file is created and written to disk.

Author(s)

Caitlin McHugh

See Also

GdsGenotypeReader, GdsIntensityReader, NcdfGenotypeReader, NcdfIntensityReader

Examples

filenm <- tempfile()

simulateGenotypeMatrix(filename=filenm )

file <- GdsGenotypeReader(filenm)
file #notice the dimensions and variables listed

genot <- getGenotype(file)
table(genot) #can see the number of missing calls

chrom <- getChromosome(file)
unique(chrom) #there are indeed 10 chromosomes, as specified in the function call

close(file)

simulateIntensityMatrix(filename=filenm, silent=FALSE )

file <- GdsIntensityReader(filenm)
file #notice the dimensions and variables listed

xint <- getX(file)
yint <- getY(file)
print("Number missing is: "); sum(is.na(xint))

chrom <- getChromosome(file)
unique(chrom) #there are indeed 10 chromosomes, as specified in the function call

close(file)

unlink(filenm)

smgogarten/GWASTools documentation built on Nov. 10, 2024, 9:54 p.m.