Description Usage Arguments Details Value Author(s) See Also Examples
View source: R/simulateGenotypeMatrix.R
These functions create a simulated genotype or intensity file for test and examples.
1 2 3 | simulateGenotypeMatrix(n.snps=10, n.chromosomes=10,
n.samples=1000, filename,
file.type=c("gds", "ncdf"), silent=TRUE)
|
n.snps |
An integer corresponding to the number of SNPs per chromosome, the default value is 10. For this function, the number of SNPs is assumed to be the same for every chromosome. |
n.chromosomes |
An integer value describing the total number of chromosomes with default value 10. |
n.samples |
An integer representing the number of samples for our data. The default value is 1000 samples. |
filename |
A string that will be used as the name of the file. This is to be used later when opening and retrieving data generated from this function. |
file.type |
The type of file to create ("gds" or "ncdf") |
silent |
Logical value. If |
The resulting netCDF file will have the following characteristics:
Dimensions:
'snp': n.snps*n.chromosomes length
'sample': n.samples length
Variables:
'sampleID': sample dimension, values 1-n.samples
'position': snp dimension, values [1,2,...,n.chromosomes] n.snps times
'chromosome': snp dimension, values [1,1,...]n.snps times, [2,2,...]n.snps times, ..., [n.chromosomes,n.chromosomes,...]n.snps times
'genotype': 2-dimensional snp x sample, values 0, 1, 2 chosen from allele frequencies that were generated from a uniform distribution on (0,1). The missing rate is 0.05 (constant across all SNPs) and is denoted by -1.
OR
'quality': 2-dimensional snp x sample, values between 0 and 1 chosen randomly from a uniform distribution. There is one quality value per snp, so this value is constant across all samples.
'X': 2-dimensional snp x sample, value of X intensity taken from a normal distribution. The mean of the distribution for each SNP is based upon the sample genotype. Mean is 0,2 if sample is homozygous, 1 if heterozygous.
'Y': 2-dimensional snp x sample, value of Y intensity also chosen from a normal distribution, where the mean is chosen according to the mean of X so that sum of means = 2.
simulateGenotypeMatrix
returns a table of genotype calls if the silent variable is set to FALSE
, where 2 indicates an AA genotype, 1 is AB, 0 is BB and -1 corresponds to a missing genotype call.
simulateIntensityMatrix
returns a list if the silent variable is set to FALSE,
which includes:
het |
Heterozygosity table |
nmiss |
Number of missing values |
A file is created and written to disk.
Caitlin McHugh
GdsGenotypeReader
, GdsIntensityReader
,
NcdfGenotypeReader
, NcdfIntensityReader
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | filenm <- tempfile()
simulateGenotypeMatrix(filename=filenm )
file <- GdsGenotypeReader(filenm)
file #notice the dimensions and variables listed
genot <- getGenotype(file)
table(genot) #can see the number of missing calls
chrom <- getChromosome(file)
unique(chrom) #there are indeed 10 chromosomes, as specified in the function call
close(file)
simulateIntensityMatrix(filename=filenm, silent=FALSE )
file <- GdsIntensityReader(filenm)
file #notice the dimensions and variables listed
xint <- getX(file)
yint <- getY(file)
print("Number missing is: "); sum(is.na(xint))
chrom <- getChromosome(file)
unique(chrom) #there are indeed 10 chromosomes, as specified in the function call
close(file)
unlink(filenm)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.