knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(qzwslrm)

Background

As described in Legarra and Reverter (2018), validation of predicted breeding values is becoming more and more important. Their work uses a cross-validation based approach using results of successive genetic evaluations. These results consist of $\hat{u}_p$ and $\hat{u}_w$ which are vectors of the same length and contain predicted breeding values based on "partial" ($p$) and "whole" ($w$) data, respectively.

Cross-Validation Statistics

The following statistics are computed based on the input $\hat{u}_p$ and $\hat{u}_w$

Example Data

As shown in https://fbzwsqualitasag.github.io/qzwslrm/articles/qzwslrm_ebv_mix99.html, example data are prepared to run the most basic validation statistics. The example data consists of two results files from MiX99 containing predicted breeding values. The first result file contains EBV predicted based on a complete dataset.

(s_ebv_path_whole <- qzwslrm_example_solani("whole"))

The second result file contains EBV predicted for the same animals but based on a partial dataset. For our example, the phenotypic observations of all animals from the latest generation were set to missing (NA).

(s_ebv_path_partial <- qzwslrm_example_solani("partial"))

These files can be read using the function read_solani()

(tbl_solani_whole <- readr_solani(ps_path = s_ebv_path_whole))

The same can be done with the partial data

(tbl_solani_partial <- readr_solani(ps_path = s_ebv_path_partial))

Validation

For the validation only the EBV vectors from the full and the partial data are required. Hence the validation function only needs the two vectors as input.

(l_val_result <- val_ebv_lrm(pvec_ebv_partial = tbl_solani_partial$ebv, 
                             pvec_ebv_whole = tbl_solani_whole$ebv))

Other Input Formats

For reasons of flexibility, EBV can be read from files having a number of different formats such as 'csv', 'csv2', 'delim' or 'table'. For testing purposes different input files are produced and stored inside of this package

# csv
s_pkg_input_dir <- system.file("extdata", "ebvinput", package = "qzwslrm")
s_csv_input_path <- file.path(s_pkg_input_dir, "ebv_input.csv")
readr::write_csv(tbl_solani_partial, file = s_csv_input_path)

# csv2
s_csv2_input_path <- file.path(s_pkg_input_dir, "ebv_input.csv2")
readr::write_csv(tbl_solani_partial, file = s_csv2_input_path)

# delim - .txt
s_delim_input_path <- file.path(s_pkg_input_dir, "ebv_input.txt")
readr::write_delim(tbl_solani_partial, file = s_delim_input_path, delim = " ")

The input files included in the package can be obtained and read via

sexdf_csv <- qzwslrm_example_input("csv")
tbl_ebv_csv <- readr_ebv(ps_path = sexdf_csv, ps_format = "csv", 
                     ps_animal_col_name = "animal", ps_ebv_col_name = "ebv")
tbl_ebv_csv

Testing the input format "table"

sexdf_delim <- qzwslrm_example_input(ps_data_format = "delim")
tbl_ebv_table <- readr_ebv(ps_path = sexdf_delim, ps_format = "table",
                           ps_animal_col_name = "animal", ps_ebv_col_name = "ebv")
tbl_ebv_table

The generic function readr_ebv() for reading ebv from input files can read data from files with different formats. Furthermore, the input files can contain more information than just EBV. But the result of the reader function is always a tibble with two columns. The first column contains the animal IDs and the second the EBV of the animal.

Summary Output

The output of the validation function is formatted using a summary function summary_lrm().

tbl_ebv_whole <- readr_ebv(ps_path = qzwslrm_example_solani("whole"), ps_format = "table",
                            pn_ebv_col_idx = 4)
tbl_ebv_partial <- readr_ebv(ps_path = qzwslrm_example_solani("partial"), ps_format = "table",
                            pn_ebv_col_idx = 4)
summary_lrm(l_val_result <- val_ebv_lrm(pvec_ebv_partial = tbl_ebv_partial$ebv, 
                             pvec_ebv_whole = tbl_ebv_whole$ebv))

Result Tibble

The results can also be converted into a tibble. This can be useful for an output as a table

 tbl_ebv_whole <- readr_ebv(ps_path = qzwslrm_example_solani("whole"), ps_format = "table",
                            pn_ebv_col_idx = 4)
 tbl_ebv_partial <- readr_ebv(ps_path = qzwslrm_example_solani("partial"), ps_format = "table",
                              pn_ebv_col_idx = 4)
 tibble_lrm(l_val_result <- val_ebv_lrm(pvec_ebv_partial = tbl_ebv_partial$ebv,
                                         pvec_ebv_whole = tbl_ebv_whole$ebv))

Scatterplots of EBV from Whole and Partial Data

The function scatterplot_lrm() creates a scatterplot of the EBV from whole and partial data

tbl_ebv_whole <- readr_ebv(ps_path = qzwslrm_example_solani("whole"), ps_format = "table",
                            pn_ebv_col_idx = 4)
tbl_ebv_partial <- readr_ebv(ps_path = qzwslrm_example_solani("partial"), ps_format = "table",
                              pn_ebv_col_idx = 4)
p <- scatterplot_lrm(tbl_ebv_whole, tbl_ebv_partial)
print(p)

The above plot shows a scatterplot of the ebv from whole data and partial data. The blue line is the linear smoothed regression line and the red line is the expected regression line with a slope of $1$.



fbzwsqualitasag/qzwslrm documentation built on Sept. 13, 2022, 3:28 p.m.