simu.GEBVGD: Simulate Progeny with GEBV-GD Strategy
In IPLGP: Identification of Parental Lines via Genomic Prediction

View source: R/simu.GEBVGD.R

simu.GEBVGD

R Documentation

Simulate Progeny with GEBV-GD Strategy

Description

Identify parental lines based on GEBV-GD strategy and simulate their offsprings.

Usage

simu.GEBVGD(
  fittedA.t,
  fittedD.t = NULL,
  fittedmu.t = NULL,
  geno.t,
  marker,
  geno.c = NULL,
  npl = NULL,
  better.c = FALSE,
  npl.best = NULL,
  weight = NULL,
  direction = NULL,
  outcross = FALSE,
  nprog = 50,
  nsele = NULL,
  ngen = 10,
  nrep = 30,
  cri = 10000,
  console = TRUE
)

Arguments

`fittedA.t`	matrix. An n*t matrix denotes the fitted values of each traits of the training population. The missing value must have been already imputed. If outcross is set to be TRUE, this argument must be the additive effect part of fitted values.
`fittedD.t`	matrix. An n*t matrix denotes the dominance effect part of fitted values when outcross is set to be TRUE. The missing value must have been already imputed.
`fittedmu.t`	numeric or vector. A p*1 vector denote the average value of fitted values when outcross is set to be TRUE. The length must be the same as the number of traits.
`geno.t`	matrix. An n*p matrix denotes the marker score matrix of the training population. The markers must be coded as 1, 0, or -1 for alleles AA, Aa, or aa. The missing value must have been already imputed.
`marker`	matrix. A p*2 matrix whose first column indicates the chromosome number to which a marker belongs; and second column indicates the position of the marker in centi-Morgan (cM).
`geno.c`	matrix. An nc*p matrix denotes the marker score matrix of the candidate population with nc individuals and p markers. It should be pure lines and markers must be coded as 1, or -1 for alleles AA, or aa. The missing value must have been already imputed. If geno.c is set to be NULL, the candidate population is exactly the training population.
`npl`	integer. An integer indicates the number of individuals who will be chosen as the parental lines. If npl = NULL, it will be 4 times the number of traits.
`better.c`	logical. A logical variable, if better.c is set to be TRUE, the candidate individuals with GEBVs better than average for all the target traits will comprise the candidate set. Otherwise, all the candidate individuals will comprise the candidate set.
`npl.best`	integer. A integer indicates the numbers of the candidate individuals with the top GEBV index will be retained. If npl.best is set to be NULL, it will be 2 times the number of traits.
`weight`	vector. A vector with length t indicates the weights of target traits in selection index. If weight is set to be NULL, the equal weight will be assigned to all the target traits. The weights should be a positive number.
`direction`	vector. A vector with length t indicates the selecting directions for target traits. The elements of direction are Inf, or -Inf representing the rule that the larger the better; or the smaller the better. Or if the element is a number, it will select the individuals with the trait value close to the number. If direction is set to be NULL, the selecting direction will be the larger the better for all trait.
`outcross`	logical. A logical variable, if outcross is set to be TRUE, the crop is regarded as an outcross crop. The kinship matrix of dominance effects are also considered in the model, and crossing and selection will be performed in F1 generation. The detail can be seen in the references.
`nprog`	integer. An integer indicates the number of progenies which will be produced for each of the best individuals at every generation.
`nsele`	integer. An integer indicates the number of the best individuals which will be selected at each generation. If nsele is set to be NULL, the number will be the same as the number of F1 individuals.
`ngen`	integer. An integer indicates the number of generations in the simulation process.
`nrep`	integer. An integer indicates the number of repetitions in the simulation process.
`cri`	integer. An integer indicates the stopping criterion, note that cri < 1e+06. The genetic algorithm will stop if the number of iterations reaches cri.
`console`	logical. A logical variable, if console is set to be TRUE, the simulation process will be shown in the R console.

Value

`method`	The GEBV-GD strategy.
`weight`	The weights of target traits in selection index.
`direction`	The selecting directions of target traits in selection index.
`mu`	The mean vector of target traits.
`sd`	The standard deviation vector of target traits.
`GEBV.value`	The GEBVs of target traits in each generation and each repetition.
`parental.lines`	The IDs and D-score of parental lines selected in each repetition.
`suggested.subset`	The most frequently selected parental lines by this strategy.

Note

The function output.best and output.gain can be used to summarize the result.

The fitted value data in the input data can be obtained by the function GBLUP.fit and mmer, that can be seen in the Examples shown below.

References

Chung PY, Liao CT. 2020. Identification of superior parental lines for biparental crossing via genomic prediction. PLoS ONE 15(12):e0243159.

Examples

# generate simulated data
set.seed(6000)
geno.test <- matrix(sample(c(1, -1), 200, replace = TRUE), 10, 20)
t1 <- 5*geno.test[,3]+3*geno.test[,7]-geno.test[,11]+rnorm(10,30,10)
t2 <- 3*geno.test[,3]+geno.test[,12]-2*geno.test[,18]+rnorm(10,10,5)
t3 <- NULL
t4 <- NULL
t5 <- NULL
marker.test <- cbind(rep(1:2, each=10), rep(seq(0, 90, 10), 2))
fit <- GBLUP.fit(t1, t2, t3, t4, t5, geno = geno.test)
fitvalue <- fit$fitted.value

geno.candidate <- matrix(sample(c(1,-1), 300, replace = TRUE), 15, 20)

# run and output
result <- simu.GEBVGD(fitvalue, geno.t = geno.test, marker = marker.test,
geno.c = geno.candidate, nprog = 5, nsele = 10, ngen = 5, nrep = 5, cri = 250)
result$suggested.subset



# other method: use mmer to obtain the fitted value
## Not run: 
set.seed(6000)
geno.test <- matrix(sample(c(1, -1), 200, replace = TRUE), 10, 20)
t1 <- 5*geno.test[,3]+3*geno.test[,7]-geno.test[,11]+rnorm(10,30,10)
t2 <- 3*geno.test[,3]+geno.test[,12]-2*geno.test[,18]+rnorm(10,10,5)
phe <- cbind(t1, t2)
nt <- ncol(phe)
marker.test <- cbind(rep(1:2, each=10), rep(seq(0, 90, 10), 2))
rownames(geno.test) <- 1:nrow(geno.test)
id <- rownames(geno.test)
K0 <- geno.test%*%t(geno.test)/ncol(geno.test)

dat <- data.frame(id, phe)
fit0 <- sommer::mmer(cbind(t1, t2)~1,
      random = ~sommer::vsr(id, Gu = K0, Gtc = sommer::unsm(nt)),
      rcov = ~sommer::vsr(units, Gtc = sommer::unsm(nt)),
      data = dat,
      tolParInv = 0.1)

u0 <- fit0$U$`u:id`
fit <- matrix(unlist(u0), ncol = nt)
colnames(fit) <- names(u0)

fit <- fit+matrix(fit0$fitted[1,], nrow(fit), nt, byrow = TRUE)
fitvalue <- fit[order(as.numeric(names((u0[[1]])))),]

## End(Not run)

IPLGP documentation built on Sept. 11, 2024, 7:35 p.m.