title: "Genomic Prediction of Cross Performance with gpcp" author: "Marlee Labroo, Christine Nyaga, Lukas Mueller" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Genomic Prediction of Cross Performance with gpcp} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8}
This vignette demonstrates how to use the gpcp
package to perform genomic prediction of cross performance using genotype and phenotype data. This method processes data in several steps, including loading the necessary software, converting genotype data, processing phenotype data, fitting mixed models, and predicting cross performance based on weighted marker effects.
The package is particularly useful for users working with polyploid species, and it integrates with the sommer
, AGHmatrix
, and snpStats
packages for efficient model fitting and genomic analysis.
If you haven't installed the gpcp
package yet, you can do so by following these steps:
# Install devtools if you don't have it install.packages("devtools") # Install BiocManager in order to install VariantAnnotatiion and snpStats if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") #Install VariantAnnotation and snpStats BiocManager::install("VariantAnnotation") BiocManager::install("snpStats") # Install gpcp from your local repository or GitHub devtools::install_github("cmn92/gpcp")
The main function in this package is runGPCP()
, which predicts the performance of genomic crosses. To run this function, you'll need two main input files:
1. A phenotype file, which is typically a CSV file containing the phenotypic data.
2. A genotype file, which can be in VCF or HapMap format.
Let’s walk through a simple example to predict cross performance using the provided phenotype and genotype data.
Before running runGPCP
, load the phenotype data from a CSV file and specify the genotype file path.
# Load phenotype data phenotypeFile <- read.csv("~/gpcp/data/phenotypeFile.csv") # Specify the genotype file path (VCF or HapMap format) genotypeFile <- "~/gpcp/data/genotypeFile_Chr9and11.vcf"
You will need to specify several inputs such as the genotypes column, traits to predict, and other variables such as weights, fixed effects, and ploidy.
# Define inputs genotypes <- "Accession" # Column name for genotype IDs in phenotype data traits <- c("YIELD", "DMC") # Traits to predict weights <- c(3, 1) # Weights for each trait userFixed <- c("LOC", "REP") # Fixed effects Ploidy <- 2 # Ploidy level NCrosses <- 150 # Number of crosses to predict
Now that we have the necessary inputs, we can run the runGPCP()
function to predict cross performance.
# Run genomic prediction of cross performance finalcrosses <- runGPCP( phenotypeFile = phenotypeFile, genotypeFile = genotypeFile, genotypes = genotypes, traits = paste(traits, collapse = ","), weights = weights, userFixed = paste(userFixed, collapse = ","), Ploidy = Ploidy, NCrosses = NCrosses )
The output of the runGPCP()
function is a data frame that contains the predicted cross performance. You can view the top predicted crosses like this:
# View the predicted crosses head(finalcrosses)
The resulting data frame contains the following columns:
- Parent1
: The first parent of the cross.
- Parent2
: The second parent of the cross.
- CrossPredictedMerit
: The predicted merit of the cross.
- P1Sex
and P2Sex
: Optional. If sex information is provided, the sexes of the parents are included.
The runGPCP()
function performs the following steps internally:
1. Read the genotype and phenotype data: The genotype file is converted into a matrix of allele counts, and the phenotype data is standardized.
2. Fit mixed models: The sommer
package is used to fit mixed models based on user-defined fixed and random effects.
3. Predict cross performance: Marker effects are calculated and weighted to predict the performance of crosses, and the best crosses are identified.
The methodology behind the gpcp
package is based on the following references:
- Xiang, J., et al. (2016). "Mixed Model Methods for Genomic Prediction." Nature Genetics.
- Batista, L., et al. (2021). "Genetic Prediction and Relationship Matrices." Theoretical and Applied Genetics.
The gpcp
package provides a flexible and efficient framework for predicting genomic cross performance in both diploid and polyploid species. With its ability to handle multiple traits, fixed effects, and random effects, this package is ideal for breeders and geneticists looking to maximize cross potential using genomic data.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.