gpData2data.frame: Merge of phenotypic and genotypic data
In synbreed: Framework for the Analysis of Genomic Prediction Data using R

Description Usage Arguments Details Value Author(s) See Also Examples

Create a data.frame out of phenotypic and genotypic data in object of class gpData by merging datasets using the common id. The shared data set could either include individuals with phenotypes and genotypes (default) or additional unphenotyped or ungenotyped individuals. In the latter cases, the missing observations are filled by NA's.

gpData2data.frame(
  gpData,
  trait = 1,
  onlyPheno = FALSE,
  all.pheno = FALSE,
  all.geno = FALSE,
  repl = NULL,
  phenoCovars = TRUE,
  ...
)

`gpData`	object of class `gpData`
`trait`	`numeric` or `character`. A vector with the names or numbers of the trait that should be extracted from pheno. Default is `1`.
`onlyPheno`	scalar `logical`. Only return phenotypic data.
`all.pheno`	scalar `logical`. Include all individuals with phenotypes in the `data.frame` and fill the genotypic data with `NA`.
`all.geno`	scalar `logical`. Include all individuals with genotypes in the `data.frame` and fill the phenotypic data with `NA`.
`repl`	`character` or `numeric`. A vector which contains names or numbers of replication that should be drawn from the phenotypic values and covariates. Default is `NULL`, i.e. all values are used.
`phenoCovars`	`logical`. If `TRUE`, columns with the phenotypic covariables are attached from element `phenoCovars` to the `data.frame`. Only required for repeated measurements.
`...`	further arguments to be used in function `reshape`. The argument `times` could be useful to rename the levels of the grouping variable (such as locations or environments).

Argument all.geno can be used to predict the genetic value of individuals without phenotypic records using the BGLR package. Here, the genetic value of individuals with NA as phenotype is predicted by the marker profile.

For multiple measures, phenotypic data in object gpData is arranged with replicates in an array. With gpData2data.frame this could be reshaped to "long" format with multiple observations in one column. In this case, one column for the phenotype and 2 additional columns for the id and the levels of the grouping variable (such as replications, years of locations in multi-environment trials) are added.

A data.frame with the individuals names in the first column, the phenotypes in the next column(s) and the marker genotypes in subsequent columns.

Valentin Wimmer and Hans-Juergen Auinger

create.gpData, reshape

# example data with unrepeated observations
set.seed(311)

# simulating genotypic and phenotypic data
pheno <- data.frame(Yield = rnorm(12, 100, 5), Height = rnorm(12, 100, 1))
rownames(pheno) <- letters[4:15]
geno <- matrix(sample(c("A", "A/B", "B", NA),
  size = 120, replace = TRUE,
  prob = c(0.6, 0.2, 0.1, 0.1)
), nrow = 10)
rownames(geno) <- letters[1:10]
colnames(geno) <- paste("M", 1:12, sep = "")
# different subset of individuals in pheno and geno

# create 'gpData' object
gp <- create.gpData(pheno = pheno, geno = geno)
summary(gp)
gp$covar

# as data.frame with individuals with genotypes and phenotypes
gpData2data.frame(gp, trait = 1:2)
# as data.frame with all individuals with phenotypes
gpData2data.frame(gp, 1:2, all.pheno = TRUE)
# as data.frame with all individuals with genotypes
gpData2data.frame(gp, 1:2, all.geno = TRUE)

# example with repeated observations
set.seed(311)

# simulating genotypic and phenotypic data
pheno <- data.frame(ID = letters[1:10], Trait = c(
  rnorm(10, 1, 2), rnorm(10, 2, 0.2),
  rbeta(10, 2, 4)
), repl = rep(1:3, each = 10))
geno <- matrix(rep(c(1, 0, 2), 10), nrow = 10)
colnames(geno) <- c("M1", "M2", "M3")
rownames(geno) <- letters[1:10]

# create 'gpData' object
gp <- create.gpData(pheno = pheno, geno = geno, repeated = "repl")

# reshape of phenotypic data and merge of genotypic data,
# levels of grouping variable loc are named "a", "b" and "c"
gpData2data.frame(gp, onlyPheno = FALSE, times = letters[1:3])