The handling of missing values (i.e. NA
) depends on where they are.
It is perfectly valid to have missing vaules in the dependent variable. There is no need of removing those individuals from the dataset. Furthermore, including them will yield predictions for their phenotype, based on the predictive variables.
library(breedR) N <- 1e3 x <- rep(1:4, each = N/4) dat <- data.frame(y = x + rnorm(N), x = factor(letters[x])) dat$y[1] <- NA head(dat) res <- remlf90(y ~ x, data = dat) ## The predicted phenotype for y[1] is the estimated effect ## of the corresponding level of x fitted(res)[1] == fixef(res)$x['a']
This is not allowed, as it would yield an underdetermined system of equations.
breedR
issues an error if missing values are detected.
N <- 1e3 x <- rep(1:4, each = N/4) dat <- data.frame(y = x + rnorm(N), x = factor(letters[x])) dat$x[c(1, 3, 5)] <- NA head(dat) res <- remlf90(y ~ x, data = dat)
Idem for a regression variable.
N <- 1e3 x <- runif(N) dat <- data.frame(y = 1 + 2*x + rnorm(N), x = x) dat$x[c(1, 3, 5)] <- NA head(dat) res <- remlf90(y ~ x, data = dat)
These are allowed. The incidence matrix will have a row of zeros for the corresponding individual.
N <- 1e3 N.blk <- 20 blk.effects <- rnorm(N.blk, sd = 2) blk.idx <- sample(seq_len(N.blk), N, replace = TRUE) dat <- data.frame(y = 1 + blk.effects[blk.idx] + rnorm(N), blk = factor(blk.idx)) dat$blk[1] <- NA head(dat) res <- remlf90(y ~ 1, random = ~ blk, data = dat) sum(model.matrix(res)$blk[1,])
As a consequence, the predicted phenotype will be based on the remaining available effects. In this case, the global mean.
fitted(res)[1] == fixef(res)$Intercept[1]
The spatial block effect is another way of writing the previous experiment. So it works in the same way.
coord <- expand.grid(row = 1:20, col = 1:50) res <- remlf90(y ~ 1, spatial = list(model = 'blocks', coord = coord, id = 'blk'), data = dat) c(sum(model.matrix(res)$spatial[1,]) == 0, fitted(res)[1] == fixef(res)$Intercept[1])
However, the empirical residuals of the individuals with missing values of the random effects will have an increased variance. We can show that by replicating the previous experiment and computing the variance of the residual for the first observation.
sample_first_residual <- function(N = 1e3, N.blk = 20) { blk.effects <- rnorm(N.blk, sd = 2) blk.idx <- sample(seq_len(N.blk), N, replace = TRUE) dat <- data.frame(y = 1 + blk.effects[blk.idx] + rnorm(N), blk = factor(blk.idx)) dat$blk[1] <- NA res <- suppressMessages(remlf90(y ~ 1, random = ~ blk, data = dat)) return(unname(resid(res)[1])) }
resid_sample <- replicate(1e2, sample_first_residual()) var(resid_sample)
This can be important when fitting several random effects. See below.
For an additive genetic effect, the relationship between individuals is given in the pedigree. It is legitimate not knowing the relatives for some individual. This is what happens with founders, for example.
Use NA
for unknown relatives.
If both are unknown (e.g. founders), the genetic effect (Breeding Value) will be
predicted based on its phenotype, the other effects, and the estimated
heritability.
dat <- breedR.sample.phenotype( fixed = c(mu = 10, x = 2), genetic = list(model = 'add_animal', Nparents = c(10, 10), sigma2_a = 2, check.factorial = FALSE), N = 1e3) head(dat) res <- remlf90(phenotype ~ 1 + X.x, genetic = list(model = 'add_animal', pedigree = dat[, 1:3], id = 'self'), data = dat) str(ranef(res)$genetic)
Important issue Having random effects with missing values in combination with genetic models, can yield spurious predictions of Breeding Values. This is due to the higher variability of the residual term, for the individuals with missing values in random effects.
Are allowed. Just like in any other random effect. For those cases, the spatial component will not participate in the prediction.
dat <- breedR.sample.phenotype( fixed = c(mu = 10, x = 2), spatial = list(model = 'AR', grid.size = c(10, 5), rho = c(.2, .8), sigma2_s = 1) ) dat$Var1[1] <- NA head(dat) res <- remlf90(phenotype ~ 1 + X.x, spatial = list(model = 'AR', coord = dat[, c('Var1', 'Var2')], rho = c(0.2, 0.8)), data = dat) sum(model.matrix(res)$spatial[1,])
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.