ExportGAPIT: Export RADdata Object for Use by Other R Packages

View source: R/data_export.R

ExportGAPITR Documentation

Export RADdata Object for Use by Other R Packages

Description

After a "RADdata" object has been run through a pipeline such as IteratePopStruct, these functions can be used to export the genotypes to R packages and other software that can perform genome-wide association and genomic prediction. ExportGAPIT, Export_rrBLUP_Amat, Export_rrBLUP_GWAS, Export_GWASpoly, and Export_TASSEL_Numeric all export continuous numerical genotypes generated by GetWeightedMeanGenotypes. Export_polymapR, Export_Structure, and Export_adegenet_genind use GetProbableGenotypes to export discrete genotypes. Export_MAPpoly and Export_polymapR_probs export genotype posterior probabilities.

Usage

ExportGAPIT(object, onePloidyPerAllele = FALSE)

Export_rrBLUP_Amat(object, naIfZeroReads = FALSE, 
                   onePloidyPerAllele = FALSE)

Export_rrBLUP_GWAS(object, naIfZeroReads = FALSE, 
                   onePloidyPerAllele = FALSE)

Export_TASSEL_Numeric(object, file, naIfZeroReads = FALSE,
                      onePloidyPerAllele = FALSE)

Export_polymapR(object, naIfZeroReads = TRUE,
                progeny = GetTaxa(object)[!GetTaxa(object) %in% 
                  c(GetDonorParent(object), GetRecurrentParent(object),
                    GetBlankTaxa(object))])

Export_polymapR_probs(object, maxPcutoff = 0.9,
                      correctParentalGenos = TRUE,
                      multiallelic = "correct")

Export_MAPpoly(object, file, pheno = NULL, ploidyIndex = 1,
               progeny = GetTaxa(object)[!GetTaxa(object) %in% 
                 c(GetDonorParent(object), GetRecurrentParent(object),
                   GetBlankTaxa(object))],
               digits = 3)

Export_GWASpoly(object, file, naIfZeroReads = TRUE, postmean = TRUE, digits = 3,
                splitByPloidy = TRUE)

Export_Structure(object, file, includeDistances = FALSE, extraCols = NULL,
                 missingIfZeroReads = TRUE)

Export_adegenet_genind(object, ploidyIndex = 1)

Arguments

object

A "RADdata" object with posterior genotype probabilities already estimated.

onePloidyPerAllele

Logical. If TRUE, for each allele the inheritance mode with the lowest \chi ^ 2 value is selected and is assumed to be the true inheritance mode. If FALSE, inheritance modes are weighted by inverse \chi ^ 2 values for each allele, and mean genotypes that have been weighted across inheritance modes are returned.

naIfZeroReads

A logical indicating whether NA should be inserted into the output matrix for any taxa and loci where the total read depth for the locus is zero. If FALSE, the output for these genotypes is essentially the mode (for Export_polymapR and Export_GWASpoly) or mean (for others) across prior genotype probabilities, since prior and posterior genotype probabilities are equal when there are no reads.

file

A character string indicating a file path to which to write.

pheno

A data frame or matrix of phenotypic values, with progeny in rows and traits in columns. Columns should be named.

ploidyIndex

Index, within object$possiblePloidies, of the ploidy to be exported.

progeny

A character vector indicating which individuals to export as progeny of the cross.

maxPcutoff

A cutoff for posterior probabilities, below which genotypes will be reported as 'NA' in the 'geno' column.

correctParentalGenos

Passed to GetProbableGenotypes. If TRUE, parental genotypes are corrected based on progeny allele frequencies.

multiallelic

Passed to GetProbableGenotypes. Under the default, genotypes at multiallelic loci will be corrected to sum to the ploidy.

digits

Number of decimal places to which to round genotype probabilities or posterior mean genotypes in the output file.

postmean

Logical. If TRUE, posterior mean genotypes will be output. If FALSE, discrete genotypes will be output.

splitByPloidy

Logical. If TRUE and there are multiple taxaPloidy values in the dataset, multiple files are written, one per ploidy.

includeDistances

Logical. If TRUE, the second row of the Structure file will contain distances between markers, which can be used by the linkage model in Structure.

extraCols

An optional data frame, with one row per taxon, containing columns of data to output to the left of the genotypes in the Structure file.

missingIfZeroReads

See naIfZeroReads. If TRUE, a value of -9 will be output for any genotypes with zero reads, indicating that those genotypes are missing.

Details

GAPIT, FarmCPU, rrBLUP, TASSEL, and GWASpoly allow genotypes to be a continuous numeric variable. MAPpoly and polymapR allow for import of genotype probabilities. GAPIT does not allow missing data, hence there is no naIfZeroReads argument for ExportGAPIT. Genotypes are exported on a scale of -1 to 1 for rrBLUP, on a scale of 0 to 2 for GAPIT and FarmCPU, and on a scale of 0 to 1 for TASSEL.

For all functions except Export_Structure and Export_adegenet_genind, one allele per marker is dropped. Export_MAPpoly also drops alleles where one or both parental genotypes could not be determined, and where both parents are homozygotes.

For ExportGAPIT and Export_rrBLUP_GWAS, chromosome and position are filled with dummy data if they do not exist in object$locTable. For Export_TASSEL_Numeric, allele names are exported, but no chromosome or position information per se.

If the chromosomes in object$locTable are in character format, ExportGAPIT, Export_MAPpoly, and Export_GWASpoly will attempt to extract chromosome numbers.

For polymapR there must only be one possible inheritance mode across loci (one value in object$possiblePloidies) in the RADdata object, although triploid F1 populations derived from diploid and tetraploid parents are allowed. See SubsetByPloidy for help reducing a RADdata object to a single inheritance mode.

MAPpoly only allows one ploidy, but Export_MAPpoly allows the user to select which inheritance mode from the RADdata object to use. (This is due to how internal polyRAD functions are coded.)

Value

For ExportGAPIT, a list:

GD

A data frame with taxa in the first column and alleles (markers) in subsequent columns, containing the genotypes. To be passed to the GD argument for GAPIT or FarmCPU.

GM

A data frame with the name, chromosome number, and position of every allele (marker). To be passed to the GM argument for GAPIT or FarmCPU.

For Export_rrBLUP_Amat, a matrix with taxa in rows and alleles (markers) in columns, containing genotype data. This can be passed to A.mat in rrBLUP.

For Export_rrBLUP_GWAS, a data frame with alleles (markers) in rows. The first three columns contain the marker names, chromosomes, and positions, and the remaining columns each represent one taxon and contain the genotype data. This can be passed to the GWAS function in rrBLUP.

Export_TASSEL_Numeric and Export_MAPpoly write a file but does not return an object.

For Export_polymapR, a matrix of integers indicating the most probable allele copy number, with markers in rows and individuals in columns. The parents are listed first, followed by all progeny.

For Export_polymapR_probs, a data frame suitable to pass to the probgeno_df argument of checkF1. Note that under default parameters, in some cases the maxP, maxgeno, and geno columns may not actually reflect the maximum posterior probability if genotype correction was performed.

For Export_adegenet_genind, a "genind" object.

Export_MAPpoly, Export_GWASpoly, and Export_Structure write files but do not return an object. Files output by Export_GWASpoly are comma delimited and in numeric format. Sample and locus names are included in the file output by Export_Structure, and the number of rows for each sample is equal to the highest ploidy as determined by the taxaPloidy slot and the output of GetProbableGenotypes.

Note

rrBLUP and polymapR are available through CRAN, and GAPIT and FarmCPU must be downloaded from the Zhang lab website. MAPpoly is available on GitHub but not yet on CRAN. GWASpoly is available from GitHub.

In my experience with TASSEL 5, numerical genotype files that are too large do not load/display properly. If you run into this problem I recommend using SplitByChromosome to split your RADdata object into multiple smaller objects, which can then be exported to separate files using Export_TASSEL_Numeric. If performing GWAS, you may also need to compute a kinship matrix using separate software such as rrBLUP.

Author(s)

Lindsay V. Clark

References

GAPIT and FarmCPU:

https://zzlab.net/GAPIT/

Lipka, A. E., Tian, F., Wang, Q., Peiffer, J., Li, M., Bradbury, P. J., Gore, M. A., Buckler, E. S. and Zhang, Z. (2012) GAPIT: genome association and prediction integrated tool. Bioinformatics 28, 2397–2399.

https://zzlab.net/FarmCPU/

Liu, X., Huang, M., Fan, B., Buckler, E. S., Zhang, Z. (2016) Iterative usage of fixed and random effects models for powerful and efficient genome-wide association studies. PLoS Genetics 12, e1005767.

rrBLUP:

Endelman, J.B. (2011) Ridge Regression and Other Kernels for Genomic Selection with R Package rrBLUP. The Plant Genome 4, 250–255.

TASSEL:

https://www.maizegenetics.net/tassel

Bradbury, P. J., Zhang, Z., Kroon, D. E., Casstevens, T. M., Ramdoss, Y. and Buckler, E. S. (2007) TASSEL: Software for association mapping of complex traits in diverse samples. Bioinformatics 23, 2633–2635.

polymapR:

Bourke, P., van Geest, G., Voorrips, R. E., Jansen, J., Kranenberg, T., Shahin, A., Visser, R. G. F., Arens, P., Smulders, M. J. M. and Maliepaard, C. (2018) polymapR: linkage analysis and genetic map construction from F1 populations of outcrossing polyploids. Bioinformatics 34, 3496–3502.

MAPpoly:

https://github.com/mmollina/MAPpoly

Mollinari, M. and Garcia, A. A. F. (2018) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models. bioRxiv doi: https://doi.org/10.1101/415232.

GWASpoly:

https://github.com/jendelman/GWASpoly

Rosyara, U. R., De Jong, W. S., Douches, D. S., and Endelman, J. B. (2016) Software for Genome-Wide Association Studies in Autopolyploids and Its Application to Potato. Plant Genome 9.

Structure:

https://web.stanford.edu/group/pritchardlab/structure.html

Hubisz, M. J., Falush, D., Stephens, M. and Pritchard, J. K. (2009) Inferring weak population structure with the assistance of sample group information. Molecular Ecology Resources 9, 1322–1332.

Falush, D., Stephens, M. and Pritchard, J. K. (2007) Inferences of population structure using multilocus genotype data: dominant markers and null alleles. Molecular Ecology Notes 7, 574–578

Falush, D., Stephens, M. and Pritchard, J. K. (2003) Inferences of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164, 1567–1587.

Pritchard, J. K., Stephens, M. and Donnelly, P. (2000) Inference of population structure using multilocus genotype data. Genetics 155, 945–959.

See Also

GetWeightedMeanGenotypes, RADdata2VCF

Examples

# load example dataset
data(exampleRAD)
# get genotype posterior probabilities
exampleRAD <- IterateHWE(exampleRAD)

# export to GAPIT
exampleGAPIT <- ExportGAPIT(exampleRAD)

# export to rrBLUP
example_rrBLUP_A <- Export_rrBLUP_Amat(exampleRAD)
example_rrBLUP_GWAS <- Export_rrBLUP_GWAS(exampleRAD)

# export to TASSEL
outfile <- tempfile() # temporary file for example
Export_TASSEL_Numeric(exampleRAD, outfile)

# for mapping populations
data(exampleRAD_mapping)

# specify donor and recurrent parents
exampleRAD_mapping <- SetDonorParent(exampleRAD_mapping, "parent1")
exampleRAD_mapping <- SetRecurrentParent(exampleRAD_mapping, "parent2")

# run the pipeline
exampleRAD_mapping <- PipelineMapping2Parents(exampleRAD_mapping)

# convert to polymapR format
examplePMR <- Export_polymapR(exampleRAD_mapping)

examplePMR2 <- Export_polymapR_probs(exampleRAD_mapping)

# export to MAPpoly
outfile2 <- tempfile() # temporary file for example
# generate a dummy phenotype matrix containing random numbers
mypheno <- matrix(rnorm(200), nrow = 100, ncol = 2,
                  dimnames = list(GetTaxa(exampleRAD_mapping)[-(1:2)],
                                  c("Height", "Yield")))
Export_MAPpoly(exampleRAD_mapping, file = outfile2, pheno = mypheno)

# load data into MAPpoly
# require(mappoly)
# mydata <- read_geno_prob(outfile2)

# export to GWASpoly
outfile3 <- tempfile() # temporary file for example
Export_GWASpoly(SubsetByPloidy(exampleRAD, list(2)), outfile3)

# export to Structure
outfile4 <- tempfile() # temporary file for example
Export_Structure(exampleRAD, outfile4)

# export to adegenet
if(requireNamespace("adegenet", quietly = TRUE)){
  mygenind <- Export_adegenet_genind(exampleRAD)
}

lvclark/polyRAD documentation built on Jan. 15, 2024, 4:19 a.m.