Description Usage Arguments Details Value Examples
This function gets genomic data ready to be used in packages or softwares that perform genomic predictions
1 2 3 |
data |
object of class |
frame |
Format of genomic data to be imputed. Two formats are currently supported. |
hapmap |
|
base |
|
sweep.sample |
|
call.rate |
|
maf |
Threshold for removing SNP by minor allele frequency. Must be between 0, 1 |
imput |
Should imputation of missing data be performed?. Default is |
imput.type |
Type of imputation. It can be "wright", "mean" or "knni". See |
outfile |
|
plot |
If |
The function allows flexible input of genomic data. Data might be in long format with 4 columns or in wide format where markers are in columns and individuals in rows. Both numeric and nitrogenous bases are accepted. Samples and markers can be eliminated based on missing data rate. Markers can also be eliminated based on the frequency of the minor allele. Three methods of imputation are currently implemented. One is carried out through combination of allele frequency and individual observed heterozygosity estimated from markers.
p(x_{ij}) = ≤ft \{ \begin{array}{ll} 0 = (1 - p_j)^2 + p_j (1 - p_j) F_i \\ 1 = 2 p_j (1 - p_j) - 2 p_j (1 - p_j) F_i\\ 2 = p_j^2 + p_j (1 - p_j) F_i \end{array} \right.
Hence, for missing values, genotypes are imputed based on their probability of occurrence. This probability depends both on genotype frequency and inbreeding of the individual a specific locus. The second method is based on mean of SNP. Thus, each missing point in a SNP j is replaced by mean of SNP j
x_{ij} = 2p_j
The "knni" imputes missing markers using the mean of the k-nearest markers. Nearest markers are found by computing the Euclidian distance between markers. If you use this option, please refer to the package impute (Hastie et al. 2017) in publications.
Returns a properly coded marker matrix output and a report specifying which individuals are removed by sweep.sample
and which markers are removed by "call.rate"
and maf
.
Also, a plot with proportion of removed markers and imputed data, for each chromosome, when the map is included, is produced when plot
is TRUE
1 2 3 4 | data(maize.line)
M <- as.matrix(maize.line)
mrc <- raw.data(M, frame="long", base=TRUE, sweep.sample= 0.8,
call.rate=0.95, maf=0.05, imput=FALSE, outfile="-101")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.