read_data | R Documentation |
Reads genotype, pedigree, and phenotype data files
read_data(
genofile,
ploidy = 4,
pedfile,
phenofile = NULL,
fixed = NULL,
bin.markers = TRUE,
dominance = NULL,
n.core = 1
)
genofile |
File with map and genotype probabilities |
ploidy |
Either 2 or 4 |
pedfile |
File with pedigree data (id,parent1,parent2) |
phenofile |
File with phenotype data (optional) |
fixed |
If there are fixed effects, this is a character vector of "factor" or "numeric" |
bin.markers |
TRUE/FALSE whether to bin markers with the same cM position |
dominance |
Maximum value of dominance that will be used for analysis. Default = ploidy. |
n.core |
Number of cores for parallel execution |
The first 3 columns of the genotype file should be the genetic map (labeled marker, chrom, cM), and a fourth column for a reference genome position (labeled bp) can also be included. The map is followed by the members of the population. The genotype data for each marker x individual combination is a string with the format "state|state|state...=>prob|prob|prob...", where "state" refers to the genotype state and "prob" is the genotype probability in decimal format. Only states with nonzero probabilities need to be listed. The encoding for the states in tetraploids is described in the documentation for the F1codes and S1codes datasets that come with the package. For diploids, there are 4 F1 genotype codes, 1,2,3,4, which correspond to haplotype combinations 1-3,1-4,2-3,2-4, respectively; the S1 genotype codes 1,2,3 correspond to 1-1,1-2,2-2, respectively.
For the phenotype file, first column is id, followed by traits, and then any fixed effects. Pass a character vector for the function argument "fixed" to specify whether each effect is a factor or numeric covariate. The number of traits is deduced based on the number of columns. Binary traits must be coded N/Y and are converted to 0/1 internally for analysis by probit regression. Missing data in the phenotype file should be coded as NA.
The parameter dominance
specifies the maximum value of dominance that can be used in subsequent analysis: 1 = additive, 2 = digenic dominance, 3 = trigenic dominance, 4 = quadrigenic dominance. The default is dominance = ploidy, which allows the full range of dominance models in functions such as scan1
and fitQTL
, but this requires the most RAM. Output files from the BGLR package are stored in a folder named 'tmp' in the current directory.
Variable of class diallel_geno
if phenofile is NULL, otherwise diallel_geno_pheno
## Not run:
## Get the location of raw csv files examples
genocsv = system.file( "vignette_data", "potato_geno.csv", package = "diaQTL" )
pedcsv = system.file( "vignette_data", "potato_ped.csv", package = "diaQTL" )
phenocsv = system.file( "vignette_data", "potato_pheno.csv", package = "diaQTL" )
## Check their location in the system
print(genocsv)
print(pedcsv)
print(phenocsv)
## Load them in R
diallel_example <- read_data(genofile = genocsv,
ploidy = 4,
pedfile = pedcsv,
phenofile = phenocsv)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.