sample_hp: Sample from historical population
In xbreed: Genomic Simulation of Purebred and Crossbred Populations

Description Usage Arguments Details Value See Also Examples

View source: R/Sample_hp_v9_Commented.R

Samples individuals from historical population as founders and simulates subsequent generations for a recent population based on user defined selection parameters.

1 2	sample_hp(hp_out, Male_founders, Female_founders, ng, litter_size, Selection, Training, saveAt, sh_output, Display)

`hp_out`	(`list`) Output of function `make_hp`.
`Male_founders`	(`data.frame`) Data frame with 1 row and 3 columns as following: Column 1) "number" is the number of male individuals to be selected from the last generation of historical population. Column 2) "select" indicates the type of selection with options: "rnd" Select individuals randomly. "phen" Select individuals based on their phenotypes. "tbv" Select individuals based on their true breeding value (tbv). Column 3) "value" Indicates to select hight: "h" or low: "l" values. Note: This Column is ignored if individuals are selected randomly.
`Female_founders`	(`data.frame`) Data frame with 1 row and 3 columns as following: Column 1) "number" is the number of female individuals to be selected from the last generation of historical population. Column 2) "select" indicates the type of selection with options: "rnd" Select individuals randomly. "phen" Select individuals based on their phenotypes. "tbv" Select individuals based on their true breeding value (tbv). Column 3) "value" Indicates to select "h" or "l" values. Note: This column is ignored if individuals are selected randomly.
`ng`	Number of generations. Range: 1 ≤q \code{ng} ≤q 500.
`litter_size`	Litter size or the number of progeny per dam. Range: 1 ≤q \code{x} ≤q 200.
`Selection`	(`data.frame`) Data frame with 2 rows and 3 colomns. First row is for the selection design of males and second row is for the selection design of females. The colomns are as following: Column 1) "size" is the number of individuals to be selected as sires/dams. Column 2) "type" indicates the type of selection with options: "rnd" Select individuals randomly. "phen" Select individuals based on their phenotypes. "tbv" Select individuals based on their true breeding value (tbv). "gebv" Select individuals based on their genomic estimated breeding value (gebv). Column 3) "value" Indicates to select "h" or "l" values. Note: This column is ignored if individuals are selected randomly.
`Training`	Optional (`data.frame`) Data frame with 1 row and 8 columns. The columns are as following: Column 1) "size" is the number of individuals to be selected for training. Column 2) "sel" Optional (`character`) Indicates the type of the selection of individuals for training. The possible options are: "rnd" Select individuals for training randomly. "min_rel_mrk" Select individuals for training, where genomic relationship among individuals based on marker information is minimum. "max_rel_mrk" Select individuals for training, where genomic relationship among individuals based on marker information is maximum. "min_rel_qtl" Select individuals for training, where genomic relationship among individuals based on qtl information is minimum. "max_rel_qtl" Select individuals for training, where genomic relationship among individuals based on qtl information is maximum. Default: "rnd" Column 3) "method" Optional (`character`) Method used for the estimation of marker effects. The possible options are: "BRR" Gaussian prior. "BayesA" scaled-t prior. "BL" Double-Exponential prior. "BayesB" two component mixture prior with a point of mass at zero and a sclaed-t slab. "BayesC" two component mixture prior with a point of mass at zero and a Gaussian slab. Default: "BRR" Column 4) "nIter" Optional The number of iterations. Default:1500 Column 5) "burnIn" Optional The number of burn-in. Default:500 Column 6) "thin" Optional The number of thinning. Default:5 Column 7) "save" Optional This may include a path and a pre-fix that will be added to the name of the files that are saved as the program runs. Default:"Out_BGLR" Column 8) "show" Optional (`Logical`) if TRUE the iteration history is printed. Default: `TRUE`. Note: This argument is compulsory if `"type"` in argument `Selection` is "gebv". More details about the argument can be found in package BGLR.
`saveAt`	Optional (`character`). Name to be used to save output files.
`sh_output`	Optional (`data.frame`). Data frame to specify generations indexs and type of data to be written to output files. User can define which type of data and which generation to be written to output files. The possible options are: "data" Individuals data except their genotypes. "qtl" QTL genotye of individuals coded as 11,12,21,22. "marker" Marker genotye of individuals. "seq" Genotype (both marker (SNP) and QTL) of individuals. "freq_qtl" QTL allele frequency. "freq_mrk" Marker allele frequency. Note: Both arguments `sh_output` and `saveAt` should present in the function in order to write the output files.
`Display`	Optional (`Logical`) Display summary of the simulated generations if is not `FALSE`. Default: `TRUE`.

Function sample_hp is used to create recent population(s). This function can be used multiple times to sample individuals from the historical population created by function make_hp. For the start up of the recent population, male and female founders come from the last generation of historical population and can be selected based on one of the options described in argument Male_founders or Female_founders. For the subsequent generations individuals can be selected based on genomic estimated breeding value "gebv". To do so, argument Training should present in the model to estimate the marker effects. Selected individuals for training are always from a generation preceding the target generation. As an example, for the calculation of GEBV for the individuals in generation 4, selected individuals from generation 3 are used for training. In order to select individuals for training, user can control type of selection by argument Training. For the options "min_rel_mrk" and "max_rel_mrk", genomic relationship matrix is constructed as following:

G = ZZ'/ 2∑_{j=1}^{m} p_j(1-p_j)

where Z=M-P. Here M is an allele-sharing matrix with m columns (m = number of markers) and n rows (n = number of genotyped individuals), and P is a matrix containing the frequency of the second allele (p_j), expressed as 2p_j. M_{ij} is 0 if the genotype of individual i for SNP j is homozygous 11, is 1 if heterozygous, or 2 if the genotype is homozygous 22. Frequencies are the observed allele frequency of each SNP. After constructing genomic relationship matrix, individuals are sorted based on their genomic relationship. User can define whether to select individuals with low relationship ("min_rel_mrk") or high relationship ("max_rel_mrk") among each other for training. As an example if option "min_rel_mrk" is considered, then selected individuals for training have the lowest relationship with each other compared to the whole population they belong.

Genomic relationship matrix for the options "min_rel_qtl" and "max_rel_qtl" are constracted as the same procedure described above except that qtl genotype of individual rather than markers are used to calculate genomic relationships among individuals.

The main features for sample_hp are as following:

Selection criteria can differ between males and females.
Different models can be used for the estimation of marker effects.
Multiple options for constructing the reference population for training.
Dynamic control of output files to be saved.

list with all data of simulated generations.

$output

(list) Two-level list ($output[[]][[]]) containing information about simulated generations. First index (x) indicates generation number. It should be noted that as data for base generation (0) is also stored by the function, to retrive data for a specific generation, index should be equal to generation number plus one. As an example to observe data for generation 2 index should be 3 i.e, $output[[3]]$data. Second index (y) that ranges from 1 to 6 contain the information as following:

$output[[x]]$data Individuals data except their genotypes. Here x is the generation index.
$output[[x]]$qtl QTL genotye of individuals..
$output[[x]]$mrk Marker genotye of individuals.
$output[[x]]$sequ Genotype (both marker (SNP) and QTL) of individuals.
$output[[x]]$freqQTL QTL allele frequency.
$output[[x]]$freqMRK Marker allele frequency.

$summary_data

Data frame with summary of simulated generations

.

$linkage_map_qtl: Linkage map for qtl

.

$linkage_map_mrk: Linkage map for marker

.

$linkage_map_qtl_mrk: Integrated linkage map for both marker and qtl

.

$allele_effcts: QTL allele effects

.

$trait: Trait specifications

.

$genome: Genome specifications

.

make_hp

# # # Simulation of a recent population following a historical population. 

# CREATE HISTORICAL POPULATION

genome<-data.frame(matrix(NA, nrow=2, ncol=6))
names(genome)<-c("chr","len","nmrk","mpos","nqtl","qpos")
genome$chr<-c(1,2)
genome$len<-c(50,60)	
genome$nmrk<-c(130,75)
genome$mpos<-c("rnd","rnd")	
genome$nqtl<-c(30,30)
genome$qpos<-rep("even",2)	
genome

hp<-make_hp(hpsize=100
,ng=10,h2=0.3,d2=0.15,phen_var=1
,genome=genome,mutr=5*10**-4,sel_seq_qtl=0.05,sel_seq_mrk=0.05,laf=0.5)

# # MAKE FIRST RECENT POPULATION USING FUNCTION sample_hp 

Male_founders<-data.frame(number=50,select="rnd") 
Female_founders<-data.frame(number=50,select="rnd")   

# Selection scheme in each generation of recent population 

Selection<-data.frame(matrix(NA, nrow=2, ncol=2))
names(Selection)<-c("Number","type")
Selection$Number[1:2]<-c(50,50)	
Selection$type[1:2]<-c("rnd","rnd")	
Selection

# Save "data" and "freq_mrk" for first and last generation of RP

my_files<-data.frame(matrix(NA, nrow=2, ncol=2))
names(my_files)<-c("data","marker")
my_files[,1]<-c(1,4) # Save data for generations 1 and 4
my_files[,2]<-c(1,4) # Save freq_mrk for generations 1 and 4
my_files

RP<-sample_hp(hp_out=hp,Male_founders=
Male_founders,Female_founders=Female_founders,
ng=4,Selection=Selection,litter_size=3,saveAt="my_RP",sh_output=my_files,Display=TRUE)

# Some results 

RP$summary_data
RP$output[[2]]$data      # Data for 1st Generation
RP$output[[4]]$freqMRK   # Marker frequencies for 3rd Generation
RP$linkage_map_qtl
RP$allele_effcts