getCausalSNPs: Draw random SNPs from genotypes.

Description Usage Arguments Details Value See Also Examples

View source: R/genotypeFunctions.R

Description

Draw random SNPs from genotypes provided or external genotype files. When drawing from external genotype files, only lines of randomly chosen SNPs are read, which is recommended for large genotype files. See details for more information.

Usage

1
2
3
4
5
getCausalSNPs(N, NrCausalSNPs = 20, genotypes = NULL, chr = NULL,
  NrSNPsOnChromosome = NULL, NrChrCausal = NULL, genoFilePrefix = NULL,
  genoFileSuffix = NULL, oxgen = FALSE, delimiter = ",",
  skipFields = NULL, probabilities = FALSE, sampleID = "ID_",
  verbose = TRUE)

Arguments

N

Number [integer] of samples to simulate.

NrCausalSNPs

Number [integer] of SNPs to chose at random.

genotypes

[NrSamples x totalNrSNPs] Matrix of genotypes [integer]/ [double].

chr

Vector of chromosome(s) [integer] to chose NrCausalSNPs from; only used when external genotype data is provided i.e. !is.null(genoFilePrefix).

NrSNPsOnChromosome

Vector of number(s) of SNPs [integer] per entry in chr (see above); has to be the same length as chr. If not provided, lines in file will be counted (which can be slow for large files).

NrChrCausal

Number [integer] of causal chromosomes to sample NrCausalSNPs from (as opposed to the actual chromosomes to chose from via chr ); only used when external genotype data is provided i.e. !is.null(genoFilePrefix).

genoFilePrefix

full path/to/chromosome-wise-genotype-file-ending- before-"chrChromosomeNumber" (no '~' expansion!) [string].

genoFileSuffix

[string] Following chromosome number including .fileformat (e.g. ".csv"); File described by genoFilePrefix-genoFileSuffix has to be a text format i.e. comma/tab/space separated.

oxgen

[boolean] Is genoFilePrefix-genoFileSuffix file in oxgen format? See readStandardGenotypes for details.

delimiter

Field separator [string] of genotypefile or genoFilePrefix-genoFileSuffix file.

skipFields

Number [integer] of fields (columns) to skip in genoFilePrefix-genoFileSuffix file. See details.

probabilities

[boolean]. If set to TRUE, the genotypes in the files described by genoFilePrefix-genoFileSuffix are provided as triplets of probabilities (p(AA), p(Aa), p(aa)) and are converted into their expected genotype frequencies by 0*p(AA) + p(Aa) + 2p(aa) via probGen2expGen.

sampleID

Prefix [string] for naming samples (will be followed by sample number from 1 to N when constructing id_samples)

verbose

[boolean] If TRUE, progress info is printed to standard out

Details

In order to chose SNPs from external genotype files without reading them into memory, genotypes for each chromosome need to be accesible as [SNPs x samples] in a separate file, containing "chrChromosomenumber" (e.g chr22) in the file name (e.g. /path/to/dir/related_nopopstructure_chr22.csv). All genotype files need to be saved in the same directory. genoFilePrefix (/path/to/dir/related_nopopstructure_) and genoFileSuffix (.csv) specify the strings leading and following the "chrChromosomenumber". The first column in each file needs to be the SNP_ID and files cannot contain a header. Subsequent columns containing additional SNP information can be skipped by setting skipFields. getCausalSNPs generates a vector of chromosomes from which to sample the SNPs. For each of the chromosomes, it counts the number of SNPs in the chromosome file and creates vectors of random numbers ranging from 1:NrSNPSinFile. Only the lines corresponding to these numbers are then read into R. The example data provided for chromosome 22 contains genotypes (50 samples) of the first 500 SNPs on chromosome 22 with a minor allele frequency of greater than 2 Genomes project.

Value

[N x NrCausalSNPs] Matrix of randomly drawn genotypes [integer]/ [double]

See Also

standardiseGenotypes

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# get causal SNPs from genotypes simulated within PhenotypeSimulator
geno <- simulateGenotypes(N=10, NrSNP=10)
causalSNPsFromSimulatedGenoStandardised <- getCausalSNPs(N=10,
NrCausalSNPs=10, genotypes=geno$genotypes)

# Get causal SNPs by sampling lines from large SNP files
genotypeFile <- system.file("extdata/genotypes/",
"genotypes_chr22.csv",
package = "PhenotypeSimulator")
genoFilePrefix <- gsub("chr.*", "", genotypeFile) 
genoFileSuffix <- ".csv" 
causalSNPsFromLines <- getCausalSNPs(N=50, NrCausalSNPs=10, chr=22, 
genoFilePrefix=genoFilePrefix, 
genoFileSuffix=genoFileSuffix)

PhenotypeSimulator documentation built on May 14, 2018, 1:04 a.m.