poppr: Genetic Analysis of Populations with Mixed Reproduction

Documented in ia locus_table pair.ia poppr poppr.all private_alleles

#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!#
#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!#
#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!#
#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!#
#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!#
#
# This software was authored by Zhian N. Kamvar and Javier F. Tabima, graduate 
# students at Oregon State University; Jonah C. Brooks, undergraduate student at
# Oregon State University; and Dr. Nik Grünwald, an employee of USDA-ARS.
#
# Permission to use, copy, modify, and distribute this software and its
# documentation for educational, research and non-profit purposes, without fee, 
# and without a written agreement is hereby granted, provided that the statement
# above is incorporated into the material, giving appropriate attribution to the
# authors.
#
# Permission to incorporate this software into commercial products may be
# obtained by contacting USDA ARS and OREGON STATE UNIVERSITY Office for 
# Commercialization and Corporate Development.
#
# The software program and documentation are supplied "as is", without any
# accompanying services from the USDA or the University. USDA ARS or the 
# University do not warrant that the operation of the program will be 
# uninterrupted or error-free. The end-user understands that the program was 
# developed for research purposes and is advised not to rely exclusively on the 
# program for any reason.
#
# IN NO EVENT SHALL USDA ARS OR OREGON STATE UNIVERSITY BE LIABLE TO ANY PARTY 
# FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
# LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, 
# EVEN IF THE OREGON STATE UNIVERSITY HAS BEEN ADVISED OF THE POSSIBILITY OF 
# SUCH DAMAGE. USDA ARS OR OREGON STATE UNIVERSITY SPECIFICALLY DISCLAIMS ANY 
# WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF 
# MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE AND ANY STATUTORY 
# WARRANTY OF NON-INFRINGEMENT. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS IS"
# BASIS, AND USDA ARS AND OREGON STATE UNIVERSITY HAVE NO OBLIGATIONS TO PROVIDE
# MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS. 
#
#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!#
#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!#
#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!#
#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!#
#!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!#

#' Produce a basic summary table for population genetic analyses.
#' 
#' @md
#' @description
#' 
#' For the \pkg{poppr} package description, please see `package?poppr`
#' 
#' This function allows the user to quickly view indices of heterozygosity,
#' evenness, and linkage to aid in the decision of a path to further analyze a
#' specified dataset. It natively takes [adegenet::genind] and
#' [genclone][genclone-class] objects, but can convert any raw data formats
#' that adegenet can take (fstat, structure, genetix, and genpop) as well as
#' genalex files exported into a csv format (see [read.genalex()] for details).
#' 
#' 
#' @param dat a [adegenet::genind] object OR a [genclone][genclone-class]
#'   object OR any fstat, structure, genetix, genpop, or genalex formatted
#'   file.
#' @param total When `TRUE` (default), indices will be calculated for the 
#'   pooled populations.
#' @param sublist a list of character strings or integers to indicate specific 
#'   population names (accessed via [adegenet::popNames()]). 
#'   Defaults to "ALL".
#' @param exclude a `vector` of population names or indexes that the user
#' wishes to discard. Default to `NULL`.
#' @param blacklist DEPRECATED, use exclude.
#' @param sample an integer indicating the number of permutations desired to 
#'   obtain p-values. Sampling will shuffle genotypes at each locus to simulate
#'   a panmictic population using the observed genotypes. Calculating the
#'   p-value includes the observed statistics, so set your sample number to one
#'   off for a round p-value (eg. `sample = 999` will give you p = 0.001 and
#'   `sample = 1000` will give you p = 0.000999001).
#' @param method an integer from 1 to 4 indicating the method of sampling 
#'   desired. see [shufflepop()] for details.
#' @param missing how should missing data be treated? `"zero"` and 
#'   `"mean"` will set the missing values to those documented in 
#'   [tab()]. `"loci"` and `"geno"` will remove any loci or
#'   genotypes with missing data, respectively (see [missingno()] for
#'   more information.
#' @param cutoff `numeric` a number from 0 to 1 indicating the percent 
#'   missing data allowed for analysis. This is to be used in conjunction with 
#'   the flag `missing` (see [missingno()] for details)
#' @param quiet `FALSE` (default) will display a progress bar for each 
#'   population analyzed.
#' @param clonecorrect default `FALSE`. must be used with the `strata`
#'   parameter, or the user will potentially get undesired results. see
#'   [clonecorrect()] for details.
#' @param strata a `formula` indicating the hierarchical levels to be used.
#'   The hierarchies should be present in the `strata` slot. See
#'   [strata()] for details.
#' @param keep an `integer`. This indicates which strata you wish to keep 
#'   after clone correcting your data sets. To combine strata, just set keep 
#'   from 1 to the number of straifications set in strata. see 
#'   [clonecorrect()] for details.
#' @param plot `logical` if `TRUE` (default) and `sampling > 0`, 
#'   a histogram will be produced for each population.
#' @param hist `logical` Deprecated. Use plot.
#' @param index `character` Either "Ia" or "rbarD". If `hist = TRUE`, 
#'   this will determine the index used for the visualization.
#' @param minsamp an `integer` indicating the minimum number of individuals
#'   to resample for rarefaction analysis. See [vegan::rarefy()] for 
#'   details.
#' @param legend `logical`. When this is set to `TRUE`, a legend describing the
#'   resulting table columns will be printed. Defaults to `FALSE`
#' @param ... arguments to be passed on to [diversity_stats()]
#'
#' @return A data frame with populations in rows and the following columns:
#' - **Pop**: A vector indicating the population factor 
#' - **N**: An integer vector indicating the number of individuals/isolates in
#'   the specified population.
#' - **MLG**: An integer vector indicating the number of multilocus genotypes
#'   found in the specified population, (see: [mlg()])
#' - **eMLG**: The expected number of MLG at the lowest common sample size (set
#'   by the parameter `minsamp`).
#' - **SE**: The standard error for the rarefaction analysis
#' - **H**: Shannon-Weiner Diversity index
#' - **G**: Stoddard and Taylor's Index 
#' - **lambda**: Simpson's index 
#' - **E.5**: Evenness 
#' - **Hexp**: Nei's gene diversity (expected heterozygosity)
#' - **Ia**: A numeric vector giving the value of the Index of Association for
#'   each population factor, (see [ia()]).
#' - **p.Ia**: A numeric vector indicating the p-value for Ia from the number
#'   of reshufflings indicated in `sample`. Lowest value is 1/n where n is the
#'   number of observed values.
#' - **rbarD**: A numeric vector giving the value of the Standardized Index of
#'   Association for each population factor, (see [ia()]).
#' - **p.rD**: A numeric vector indicating the p-value for rbarD from the
#'   number of reshuffles indicated in `sample`. Lowest value is 1/n where n is
#'   the number of observed values.
#' - **File**: A vector indicating the name of the original data file.
#'
#' @details 
#'
#' This table is intended to be a first look into the dynamics of mutlilocus
#' genotype diversity. Many of the statistics (except for the the index of
#' association) are simply based on counts of multilocus genotypes and do not
#' take into account the actual allelic states. **Descriptions of the
#' statistics can be found in the Algorithms and Equations vignette**:
#' `vignette("algo", package = "poppr")`.
#'
#' ## sampling
#'
#' The sampling procedure is explicitly for testing the index of association.
#' None of the other diversity statistics (H, G, lambda, E.5) are tested with
#' this sampling due to the differing data types. To obtain confidence
#' intervals for these statistics, please see [diversity_ci()].
#'
#' ## rarefaction
#'
#' Rarefaction analysis is performed on the number of multilocus genotypes
#' because it is relatively easy to estimate (Grünwald et al., 2003). To
#' obtain rarefied estimates of diversity, it is possible to use
#' [diversity_ci()] with the argument `rarefy = TRUE`
#'
#' ## graphic
#'
#' This function outputs a \pkg{ggplot2} graphic of histograms. These can be
#' manipulated to be visualized in another manner by retrieving the plot with
#' the [last_plot()] command from \pkg{ggplot2}. A useful manipulation would
#' be to arrange the graphs into a single column so that the values of the
#' statistic line up: `p <- last_plot(); p + facet_wrap(~population,
#' ncol = 1, scales = "free_y")` The name for the groupings is
#' "population" and the name for the x axis is "value".
#'
#' @note The calculation of `Hexp` has changed from \pkg{poppr} 1.x. It was
#'   previously calculated based on the diversity of multilocus genotypes, 
#'   resulting in a value of 1 for sexual populations. This was obviously not 
#'   Nei's 1978 expected heterozygosity. We have thus changed the statistic to 
#'   be the true value of Hexp by calculating \eqn{(\frac{n}{n-1}) 1 - \sum_{i =
#'   1}^k{p^{2}_{i}}}{(n/(n - 1))*(1 - sum(p^2))} where p is the allele
#'   frequencies at a given locus and n is the number of observed alleles (Nei,
#'   1978) in each locus and then returning the average. Caution should be 
#'   exercised in interpreting the results of Hexp with polyploid organisms with
#'   ambiguous ploidy. The lack of allelic dosage information will cause rare 
#'   alleles to be over-represented and artificially inflate the index. This is 
#'   especially true with small sample sizes.
#'
#' @seealso [clonecorrect()], 
#'   [poppr.all()], 
#'   [ia()], 
#'   [missingno()], 
#'   [mlg()], 
#'   [diversity_stats()],
#'   [diversity_ci()]
#'
#' @export
#' @author Zhian N. Kamvar
#' @references  Paul-Michael Agapow and Austin Burt. Indices of multilocus 
#'   linkage disequilibrium. _Molecular Ecology Notes_, 1(1-2):101-102, 
#'   2001
#'
#'   A.H.D. Brown, M.W. Feldman, and E. Nevo. Multilocus structure of natural 
#'   populations of _Hordeum spontaneum_. _Genetics_, 96(2):523-536,
#'   1980.
#'
#'   Niklaus J. Gr\"unwald, Stephen B. Goodwin, Michael G. Milgroom, and William
#'   E. Fry. Analysis of genotypic diversity data for populations of 
#'   microorganisms. Phytopathology, 93(6):738-46, 2003
#'
#'   Bernhard Haubold and Richard R. Hudson. Lian 3.0: detecting linkage 
#'   disequilibrium in multilocus data. Bioinformatics, 16(9):847-849, 2000.
#'
#'   Kenneth L.Jr. Heck, Gerald van Belle, and Daniel Simberloff. Explicit 
#'   calculation of the rarefaction diversity measurement and the determination 
#'   of sufficient sample size. Ecology, 56(6):pp. 1459-1461, 1975
#'
#'   Masatoshi Nei. Estimation of average heterozygosity and genetic distance 
#'   from a small number of individuals. Genetics, 89(3):583-590, 1978.
#'
#'   S H Hurlbert. The nonconcept of species diversity: a critique and 
#'   alternative parameters. Ecology, 52(4):577-586, 1971.
#'
#'   J.A. Ludwig and J.F. Reynolds. Statistical Ecology. A Primer on Methods and
#'   Computing. New York USA: John Wiley and Sons, 1988.
#'
#'   Simpson, E. H. Measurement of diversity. Nature 163: 688, 1949 
#'   doi:10.1038/163688a0
#'
#'   Good, I. J. (1953). On the Population Frequency of Species and the 
#'   Estimation of Population Parameters. _Biometrika_ 40(3/4): 237-264.
#'
#'   Lande, R. (1996). Statistics and partitioning of species diversity, and 
#'   similarity among multiple communities. _Oikos_ 76: 5-13.
#'
#'   Jari Oksanen, F. Guillaume Blanchet, Roeland Kindt, Pierre Legendre, Peter 
#'   R. Minchin, R. B. O'Hara, Gavin L. Simpson, Peter Solymos, M. Henry H. 
#'   Stevens, and Helene Wagner. vegan: Community Ecology Package, 2012. R 
#'   package version 2.0-5.
#'
#'   E.C. Pielou. Ecological Diversity. Wiley, 1975.
#'
#'   Claude Elwood Shannon. A mathematical theory of communication. Bell Systems
#'   Technical Journal, 27:379-423,623-656, 1948
#'
#'   J M Smith, N H Smith, M O'Rourke, and B G Spratt. How clonal are bacteria? 
#'   Proceedings of the National Academy of Sciences, 90(10):4384-4388, 1993.
#'
#'   J.A. Stoddart and J.F. Taylor. Genotypic diversity: estimation and 
#'   prediction in samples. Genetics, 118(4):705-11, 1988.
#'
#'
#' @examples
#' data(nancycats)
#' poppr(nancycats)
#' 
#' \dontrun{
#' # Sampling
#' poppr(nancycats, sample = 999, total = FALSE, plot = TRUE)
#' 
#' # Customizing the plot
#' library("ggplot2")
#' p <- last_plot()
#' p + facet_wrap(~population, scales = "free_y", ncol = 1)
#' 
#' # Turning off diversity statistics (see get_stats)
#' poppr(nancycats, total=FALSE, H = FALSE, G = FALSE, lambda = FALSE, E5 = FALSE)
#' 
#' # The previous version of poppr contained a definition of Hexp, which
#' # was calculated as (N/(N - 1))*lambda. It basically looks like an unbiased 
#' # Simpson's index. This statistic was originally included in poppr because it
#' # was originally included in the program multilocus. It was finally figured
#' # to be an unbiased Simpson's diversity metric (Lande, 1996; Good, 1953).
#' 
#' data(Aeut)
#' 
#' uSimp <- function(x){
#'   lambda <- vegan::diversity(x, "simpson")
#'   x <- drop(as.matrix(x))
#'   if (length(dim(x)) > 1){
#'     N <- rowSums(x)
#'   } else {
#'     N <- sum(x)
#'   }
#'   return((N/(N-1))*lambda)
#' }
#' poppr(Aeut, uSimp = uSimp)
#' 
#' 
#' # Demonstration with viral data
#' # Note: this is a larger data set that could take a couple of minutes to run
#' # on slower computers. 
#' data(H3N2)
#' strata(H3N2) <- data.frame(other(H3N2)$x)
#' setPop(H3N2) <- ~country
#' poppr(H3N2, total = FALSE, sublist=c("Austria", "China", "USA"), 
#'   clonecorrect = TRUE, strata = ~country/year)
#' }
#' @import adegenet ggplot2 vegan
poppr <- function(dat, total = TRUE, sublist = "ALL", exclude = NULL, blacklist = NULL, 
                  sample = 0, method = 1, missing = "ignore", cutoff = 0.05, 
                  quiet = FALSE, clonecorrect = FALSE, strata = 1, keep = 1, 
                  plot = TRUE, hist = TRUE, index = "rbarD", minsamp = 10, 
                  legend = FALSE, ...){

  if (inherits(dat, c("genlight", "snpclone"))){
    msg <- "The poppr function will not work with genlight or snpclone objects"
    msg <- paste0(msg, "\nIf you want to calculate genotypic diversity, use ",
                  "the function diversity_stats().")
    stop(msg)
  }
  quiet <- should_poppr_be_quiet(quiet)
  x <- process_file(dat, missing = missing, cutoff = cutoff, 
                    clonecorrect = clonecorrect, strata = strata,
                    keep = keep, quiet = TRUE)  
  # The namelist will contain information such as the filename and population
  # names so that they can easily be ported around.
  namelist <- NULL
  hist <- plot
  callpop <- match.call()
  if (!is.null(blacklist)) {
    warning(
      option_deprecated(
        callpop, 
        "blacklist", 
        "exclude", 
        "2.8.7.", 
        "Please use `exclude` in the future"
       ), 
      immediate. = TRUE
    )
    exclude <- blacklist
  }
  if (!is.na(grep("system.file", callpop)[1])){
    popsplt <- unlist(strsplit(dat, "/"))
    namelist$File <- popsplt[length(popsplt)]
  } else if (is.genind(dat)){
    namelist$File <- as.character(callpop[2])
  } else {
    namelist$File <- basename(x$X)
  }
  if (toupper(sublist[1]) == "TOTAL" & length(sublist) == 1){
    dat           <- x$GENIND
    pop(dat)      <- rep("Total", nInd(dat))
    poplist       <- NULL
    poplist$Total <- dat
  } else {
    dat <- popsub(x$GENIND, sublist = sublist, exclude = exclude)
    if (any(levels(pop(dat)) == "")) {
      levels(pop(dat))[levels(pop(dat)) == ""] <- "?"
      warning("missing population factor replaced with '?'")
    }
    pdrop   <- if (dat$type == "PA") FALSE else TRUE
    poplist <- if (is.null(pop(dat))) NULL else seppop(dat, drop = pdrop)
  }

  # Creating the genotype matrix for vegan's diversity analysis.
  pop.mat <- mlg.matrix(dat)
  if (total == TRUE & !is.null(poplist) & length(poplist) > 1){
    poplist$Total <- dat
    pop.mat       <- rbind(pop.mat, colSums(pop.mat))
  }
  sublist <- names(poplist)
  Iout    <- NULL
  total   <- toupper(total)
  missing <- toupper(missing)
  # For presence/absences markers, a different algorithm is applied. 
  if (legend) poppr_message()
  
  MLG.vec <- rowSums(ifelse(pop.mat > 0, 1, 0))
  N.vec   <- rowSums(pop.mat)
  datploid <- unique(ploidy(dat))
  Hexp_correction <- 1
  if (length(datploid) > 1 || any(datploid > 2)){
    datploid <- NULL
    Hexp_correction <- N.vec/(N.vec - 1)
  }
  divmat <- diversity_stats(pop.mat, ...)
  if (!is.matrix(divmat)){
    divmat <- matrix(divmat, nrow = 1, dimnames = list(NULL, names(divmat)))
  }
  
  if (!is.null(poplist)){
    # rarefaction giving the standard errors. This will use the minimum pop size
    # above a user-defined threshold.
    raremax <- ifelse(is.null(nrow(pop.mat)), sum(pop.mat), 
                      ifelse(min(rowSums(pop.mat)) > minsamp, 
                             min(rowSums(pop.mat)), minsamp))

    Hexp <- vapply(lapply(poplist, pegas::as.loci), FUN = get_hexp_from_loci, 
                   FUN.VALUE = numeric(1), ploidy = datploid, type = dat@type)

    Hexp   <- data.frame(Hexp = Hexp)
    N.rare <- suppressWarnings(vegan::rarefy(pop.mat, raremax, se = TRUE))
    IaList <- lapply(sublist, function(x){
      namelist <- list(file = namelist$File, population = x)
      .ia(poplist[[x]], 
          sample = sample, 
          method = method,
          quiet = quiet, 
          missing = missing, 
          hist = FALSE,
          namelist = namelist)
    })    
    names(IaList) <- sublist
    if (sample > 0){
      classtest <- summary(IaList)
      classless <- !classtest[, "Class"] %in% "ialist"
      if (any(classless)){
        no_class_pops <- paste(names(IaList[classless]), collapse = ", ")
        msg    <- paste0("values for ", no_class_pops, 
                         " could not be plotted.\n")
        IaList[classless] <- lapply(IaList[classless], function(x) list(index = x))
        warning(msg, call. = FALSE)
      }
      if (plot){
        try(print(poppr.plot(sample = IaList[!classless], file = namelist$File)))
      }
      IaList <- data.frame(t(vapply(IaList, "[[", numeric(4), "index")))
    } else {
      IaList <- t(as.data.frame(IaList))
    }
    Iout <- as.data.frame(
      list(
        Pop = sublist,
        N = N.vec,
        MLG = MLG.vec,
        eMLG = N.rare[1, ],
        SE = N.rare[2, ],
        divmat,
        Hexp,
        IaList,
        File = namelist$File
      ),
      stringsAsFactors = FALSE
    ) 
    rownames(Iout) <- NULL
  } else { 
    # rarefaction giving the standard errors. No population structure means that
    # the sample is equal to the number of individuals.
    N.rare <- rarefy(pop.mat, sum(pop.mat), se = TRUE)
    Hexp   <- get_hexp_from_loci(pegas::as.loci(dat), 
                                 ploidy = datploid, type = dat@type)
    Hexp   <- data.frame(Hexp = Hexp)
    IaList <-.ia(dat, 
                 sample = sample, 
                 method = method, 
                 quiet = quiet,
                 missing = missing, 
                 namelist = list(File = namelist$File, population = "Total"),
                 hist = plot
                )
    IaList <- if (sample > 0) IaList$index else IaList
    Iout <- as.data.frame(list(
      Pop = "Total",
      N = N.vec,
      MLG = MLG.vec,
      eMLG = N.rare[1, ],
      SE = N.rare[2, ],
      divmat,
      Hexp,
      as.data.frame(t(IaList)),
      File = namelist$File
    ), stringsAsFactors = FALSE) 
    rownames(Iout) <- NULL
  }
  class(Iout) <- c("popprtable", "data.frame")
  return(Iout) 
}

#' Process a list of files with poppr
#'
#' poppr.all is a wrapper function that will loop through a list of files from
#' the working directory, execute [poppr()], and concatenate the
#' output into one data frame.
#'
#' @param filelist a list of files in the current working directory
#'
#' @param ... arguments passed on to poppr
#'
#' @return see [poppr()]
#'
#' @seealso [poppr()], [getfile()]
#' @export
#' @author Zhian N. Kamvar
#' @examples
#' \dontrun{
#' # Obtain a list of fstat files from a directory.
#' x <- getfile(multi=TRUE, pattern="^.+?dat$")
#'
#' # run the analysis on each file.
#' poppr.all(file.path(x$path, x$files))
#' }
poppr.all <- function(filelist, ...){
  result <- NULL
  for(a in seq(length(filelist))){
    cat(" \\    \n")
    input <- filelist[[a]]
    if (is.genind(input)){
      file <- names(filelist)[a]
      if (is.null(file)){
        file <- a
      }
      cat("  | Data: ")
    } else {
      file <- basename(input)
      cat("  | File: ")
    }
    cat(file, "\n /    \n")
    res      <- poppr(input, ...)
    res$File <- file
    result   <- rbind(result, res)
  }
  return(result)
}

#' Index of Association
#' 
#' Calculate the Index of Association and Standardized Index of Association.
#' 
#' * [ia()] calculates the index of association over all loci in the data set.
#' * [pair.ia()] calculates the index of association in a pairwise manner
#'   among all loci.
#' * [resample.ia()] calculates the index of association on a reduced data set
#'   multiple times to create a distribution, showing the variation of values
#'   observed at a given sample size (previously [jack.ia()]).
#' 
#' 
#' @param gid a [adegenet::genind()] or [genclone()] object.
#' @param sample an integer indicating the number of permutations desired 
#'   (eg 999).
#' @param method an integer from 1 to 4 indicating the sampling method desired.
#'   see [shufflepop()] for details.
#' @param quiet Should the function print anything to the screen while it is 
#'   performing calculations?
#'   `TRUE` prints nothing.
#'   `FALSE` (default) will print the population name and progress bar.
#' @param missing a character string. see [missingno()] for details.
#' @param plot When `TRUE` (default), a heatmap of the values per locus pair 
#'   will be plotted (for [pair.ia()]). When `sampling > 0`, different things
#'   happen with [ia()] and [pair.ia()]. For [ia()], a histogram for the data
#'   set is plotted. For [pair.ia()], p-values are added as text on the
#'   heatmap. 
#' @param hist `logical` Deprecated. Use plot.
#' @param index `character` either "Ia" or "rbarD". If `hist = TRUE`, 
#'   this indicates which index you want represented in the plot (default:
#'   "rbarD").
#' @param valuereturn `logical` if `TRUE`, the index values from the 
#'   reshuffled data is returned. If `FALSE` (default), the index is 
#'   returned with associated p-values in a 4 element numeric vector.
#' @return 
#' ## for [pair.ia()]
#'
#' A matrix with two columns and choose(nLoc(gid), 2) rows representing the
#' values for Ia and rbarD per locus pair.
#' 
#' ## If no sampling has occurred:
#'
#' A named number vector of length 2 giving the Index of Association, "Ia";
#' and the Standardized Index of Association, "rbarD" 
#' 
#' ## If there is sampling:
#'
#' A a named numeric vector of length 4 with the following values:
#' 
#' * Ia - numeric. The index of association. 
#' * p.Ia - A number indicating the p-value resulting from a one-sided
#'   permutation test based on the number of samples indicated in the 
#'   original call.
#' * rbarD - numeric. The standardized index of association.
#' * p.rD - A factor indicating the p-value resulting from a
#'   one-sided permutation test based on the number of samples indicated in
#'   the original call. 
#' 
#' ## If there is sampling and `valureturn = TRUE`
#'
#' A list with the following elements:
#' 
#' * index The above vector
#' * samples A data frame with s by 2 column data frame where s is the
#'   number of samples defined. The columns are for the values of Ia and
#'   rbarD, respectively.
#'
#'
#' @note [jack.ia()] is deprecated as the name was misleading. Please use
#'   [resample.ia()]
#' @details 
#' The index of association was originally developed by A.H.D. Brown analyzing
#' population structure of wild barley (Brown, 1980). It has been widely used
#' as a tool to detect clonal reproduction within populations . Populations
#' whose members are undergoing sexual reproduction, whether it be selfing or
#' out-crossing, will produce gametes via meiosis, and thus have a chance to
#' shuffle alleles in the next generation. Populations whose members are
#' undergoing clonal reproduction, however, generally do so via mitosis. This
#' means that the most likely mechanism for a change in genotype is via
#' mutation. The rate of mutation varies from species to species, but it is
#' rarely sufficiently high to approximate a random shuffling of alleles. The
#' index of association is a calculation based on the ratio of the variance of
#' the raw number of differences between individuals and the sum of those
#' variances over each locus . You can also think of it as the observed
#' variance over the expected variance. If they  are the same, then the index
#' is zero after subtracting one (from Maynard-Smith, 1993): 
#' \deqn{I_A = \frac{V_O}{V_E}-1}{Ia = (Vo/Ve) - 1} 
#'
#' Since the distance is more or less a binary distance, any sort of marker can
#' be used for this analysis. In the calculation, phase is not considered, and
#' any difference increases the distance between two individuals. Remember that
#' each column represents a different allele and that each entry in the table
#' represents the fraction of the genotype made up by that allele at that
#' locus. Notice also that the sum of the rows all equal one. Poppr uses this
#' to calculate distances by simply taking the sum of the absolute values of
#' the differences between rows.
#'
#' The calculation for the distance between two individuals at a single locus 
#' with _a_ allelic states and a ploidy of _k_ is as follows (except
#' for Presence/Absence data): 
#'
#' \deqn{ d = \displaystyle \frac{k}{2}\sum_{i=1}^{a} \mid A_{i} - B_{i}\mid}{d(A,B) = (k/2)*sum(abs(Ai - Bi))} 
#'
#' To find the total number of differences between two individuals over all
#' loci, you just take _d_ over _m_ loci, a value we'll call
#' _D_:
#'
#' \deqn{D = \displaystyle \sum_{i=1}^{m} d_i }{D = sum(di)}
#'
#' These values are calculated over all possible combinations of individuals 
#' in the data set, \eqn{{n \choose 2}}{choose(n, 2)} after which you end up 
#' with \eqn{{n \choose 2}\cdot{}m}{choose(n, 2) * m} values of _d_ and 
#' \eqn{{n \choose 2}}{choose(n, 2)} values of _D_. Calculating the 
#' observed variances is fairly straightforward (modified from Agapow and 
#' Burt, 2001):
#'
#' \deqn{ V_O = \frac{\displaystyle \sum_{i=1}^{n \choose 2} D_{i}^2 - 
#' \frac{(\displaystyle\sum_{i=1}^{n \choose 2} D_{i})^2}{{n \choose 2}}}{{n 
#' \choose 2}}}{Vo = var(D)}
#'
#' Calculating the expected variance is the sum of each of the variances of the
#' individual loci. The calculation at a single locus, _j_ is the same as
#' the previous equation, substituting values of _D_ for _d_:
#'
#' \deqn{ var_j = \frac{\displaystyle \sum_{i=1}^{n \choose 2} d_{i}^2 - 
#' \frac{(\displaystyle\sum_{i=1}^{n \choose 2} d_i)^2}{{n \choose 2}}}{{n 
#' \choose 2}} }{Varj = var(dj)}
#'
#' The expected variance is then the sum of all the variances over all _m_
#' loci:
#'
#' \deqn{ V_E = \displaystyle \sum_{j=1}^{m} var_j }{Ve = sum(var(dj))}
#'
#' Agapow and Burt showed that \eqn{I_A}{Ia} increases steadily with the number
#' of loci, so they came up with an approximation that is widely used,
#' \eqn{\bar r_d}{rbarD}. For the derivation, see the manual for
#' _multilocus_.
#'
#' \deqn{ \bar r_d = \frac{V_O - V_E} {2\displaystyle 
#' \sum_{j=1}^{m}\displaystyle \sum_{k \neq j}^{m}\sqrt{var_j\cdot{}var_k}} 
#' }{rbarD = (Vo - Ve)/(2*sum(sum(sqrt(var(dj)*var(dk))))}
#'
#' @references 
#' Paul-Michael Agapow and Austin Burt. Indices of multilocus 
#' linkage disequilibrium. _Molecular Ecology Notes_, 1(1-2):101-102, 
#' 2001
#'
#' A.H.D. Brown, M.W. Feldman, and E. Nevo. Multilocus structure of natural 
#' populations of _Hordeum spontaneum_. _Genetics_, 96(2):523-536, 1980.
#'
#' J M Smith, N H Smith, M O'Rourke, and B G Spratt. How clonal are bacteria? 
#' Proceedings of the National Academy of Sciences, 90(10):4384-4388, 1993.
#'
#' @seealso [poppr()], [missingno()], 
#'   [import2genind()], [read.genalex()], 
#'   [clonecorrect()], [win.ia()], [samp.ia()]
#'
#' @export
#' @md
#' @rdname ia
#' @author Zhian N. Kamvar
#' @examples
#' data(nancycats)
#' ia(nancycats)
#' 
#' # Pairwise over all loci:
#' data(partial_clone)
#' res <- pair.ia(partial_clone)
#' plot(res, low = "black", high = "green", index = "Ia")
#' 
#' # Resampling
#' data(Pinf)
#' resample.ia(Pinf, reps = 99)
#' 
#' \dontrun{
#' 
#' # Pairwise IA with p-values (this will take about a minute)
#' res <- pair.ia(partial_clone, sample = 999)
#' head(res)
#' 
#' # Plot the results of resampling rbarD. 
#' library("ggplot2")
#' Pinf.resamp <- resample.ia(Pinf, reps = 999)
#' ggplot(Pinf.resamp[2], aes(x = rbarD)) +
#'   geom_histogram() +
#'   geom_vline(xintercept = ia(Pinf)[2]) +
#'   geom_vline(xintercept = ia(clonecorrect(Pinf))[2], linetype = 2) +
#'   xlab(expression(bar(r)[d]))
#' 
#' # Get the indices back and plot the distributions.
#' nansamp <- ia(nancycats, sample = 999, valuereturn = TRUE)
#' 
#' plot(nansamp, index = "Ia")
#' plot(nansamp, index = "rbarD")
#' 
#' # You can also adjust the parameters for how large to display the text
#' # so that it's easier to export it for publication/presentations.
#' library("ggplot2")
#' plot(nansamp, labsize = 5, linesize = 2) +
#'   theme_bw() +                                      # adding a theme
#'   theme(text = element_text(size = rel(5))) +       # changing text size
#'   theme(plot.title = element_text(size = rel(4))) + # changing title size
#'   ggtitle("Index of Association of nancycats")      # adding a new title
#' 
#' # Get the index for each population.
#' lapply(seppop(nancycats), ia)
#' # With sampling
#' lapply(seppop(nancycats), ia, sample = 999)
#' 
#' # Plot pairwise ia for all populations in a grid with cowplot
#' # Set up the library and data
#' library("cowplot")
#' data(monpop)
#' splitStrata(monpop) <- ~Tree/Year/Symptom
#' setPop(monpop)      <- ~Tree
#' 
#' # Need to set up a list in which to store the plots.
#' plotlist        <- vector(mode = "list", length = nPop(monpop))
#' names(plotlist) <- popNames(monpop)
#' 
#' # Loop throgh the populations, calculate pairwise ia, plot, and then
#' # capture the plot in the list
#' for (i in popNames(monpop)){
#'   x <- pair.ia(monpop[pop = i], limits = c(-0.15, 1)) # subset, calculate, and plot
#'   plotlist[[i]] <- ggplot2::last_plot() # save the last plot
#' }
#' 
#' # Use the plot_grid function to plot.
#' plot_grid(plotlist = plotlist, labels = paste("Tree", popNames(monpop)))
#' 
#' }
ia <- function(gid, sample = 0, method = 1, quiet = FALSE, missing = "ignore", 
               plot = TRUE, hist = TRUE, index = "rbarD", valuereturn = FALSE){
  namelist <- list(population = ifelse(nPop(gid) > 1 | is.null(gid@pop), 
                                       "Total", popNames(gid)),
                   File = as.character(match.call()[2])
                  )
  hist    <- plot
  popx    <- gid
  missing <- toupper(missing)
  type    <- gid@type
  quiet   <- should_poppr_be_quiet(quiet)
  if (type == "PA"){
    .Ia.Rd <- .PA.Ia.Rd
  } else {
    popx <- seploc(popx)
  }

  # if there are less than three individuals in the population, the calculation
  # does not proceed. 
  if (nInd(gid) < 3){
    IarD <- stats::setNames(as.numeric(c(NA, NA)), c("Ia", "rbarD"))
    if (sample == 0){
      return(IarD)
    } else {
      IarD <- stats::setNames(as.numeric(rep(NA, 4)), c("Ia","p.Ia","rbarD","p.rD"))
      return(IarD)
    }
  }
  
  IarD <- .Ia.Rd(popx, missing)
  names(IarD) <- c("Ia", "rbarD")
  # no sampling, it will simply return two named numbers.
  if (sample == 0){
    Iout   <- IarD
    result <- NULL
  } else {
  # sampling will perform the iterations and then return a data frame indicating
  # the population, index, observed value, and p-value. It will also produce a 
  # histogram.
    Iout     <- NULL 
    # idx      <- data.frame(Index = names(IarD))
    if (quiet) {
      oh <- progressr::handlers()
      on.exit(progressr::handlers(oh))
      progressr::handlers("void")
    }
    progressr::with_progress({
      samp <- .sampling(
        popx, sample, missing, quiet = quiet, type = type, method = method
      )
    })
    p.val    <- sum(IarD[1] <= c(samp$Ia, IarD[1]))/(sample + 1)
    p.val[2] <- sum(IarD[2] <= c(samp$rbarD, IarD[2]))/(sample + 1)

    if (hist == TRUE){
      the_plot <- poppr.plot(samp, observed = IarD, pop = namelist$population,
        index = index, file = namelist$File, pval = p.val, N = nrow(gid@tab)
      )
      print(the_plot)
    }
    result <- stats::setNames(vector(mode = "numeric", length = 4), 
                       c("Ia","p.Ia","rbarD","p.rD"))
    result[c(1, 3)] <- IarD
    result[c(2, 4)] <- p.val
    if (valuereturn == TRUE){
      iaobj        <- list(index = final(Iout, result), samples = samp)
      class(iaobj) <- "ialist"
      return(iaobj)
    } 
  }  
  return(final(Iout, result))
}

#' @rdname ia
#' @param low (for pair.ia) a color to use for low values when `plot =
#'   TRUE`
#' @param high (for pair.ia) a color to use for low values when `plot =
#'   TRUE`
#' @param limits (for pair.ia) the limits to be used for the color scale. 
#'   Defaults to `NULL`. If you want to use a custom range, supply two
#'   numbers between -1 and 1, (e.g. `limits = c(-0.15, 1)`)
#' @export
pair.ia <- function(gid, sample = 0L, quiet = FALSE, plot = TRUE, low = "blue", 
                    high = "red", limits = NULL, index = "rbarD", method = 1L){
  N       <- nInd(gid)
  numLoci <- nLoc(gid)
  lnames  <- locNames(gid)
  np      <- choose(N, 2)
  nploci  <- choose(numLoci, 2)
  shuffle <- sample > 0L
  if (quiet) {
    oh <- progressr::handlers()
    on.exit(progressr::handlers(oh))
    progressr::handlers("void")
  }
  progressr::with_progress({
    p <- make_progress((1 + sample) * nploci, 50)
  res <- pair_ia_internal(gid, N, numLoci, lnames, np, nploci, p, sample = 0)
  if (shuffle) {
    # Initialize with 1 to account for the observed data.
    counts <- matrix(1L, nrow = nrow(res), ncol = ncol(res))
    for (i in seq_len(sample)) {
      tmp    <- shufflepop(gid, method = method)
      tmpres <- pair_ia_internal(tmp, N, numLoci, lnames, np, nploci, p, i)
      counts <- counts + as.integer(tmpres >= res)
    }
    p   <- counts/(sample + 1)
    res <- cbind(Ia = res[, 1], 
                 p.Ia = p[, 1], 
                 rbarD = res[, 2], 
                 p.rD = p[, 2])
  }
  })
  class(res) <- c("pairia", "matrix")
  if (plot) {
    tryCatch(plot(res, index = index, low = low, high = high, limits = limits),
             error = function(e) e)
  }
  res
}


pair_ia_internal <- function(gid, N, numLoci, lnames, np, nploci, p, sample = NULL) {
  # Calculate pairwise distances for each locus. This will be a matrix of 
  # np rows and numLoci columns.
  if (gid@type == "codom") {
    V <- pair_matrix(seploc(gid), numLoci, np)
  } else { # P/A case
    V <- apply(tab(gid), 2, function(x) as.vector(dist(x)))
    # checking for missing data and imputing the comparison to zero.
    if (any(is.na(V))) {
      V[which(is.na(V))] <- 0
    }
  }
  colnames(V) <- lnames

  # calculate I_A and \bar{r}_d for each combination of loci
  loci_pairs  <- combn(lnames, 2)
  ia_pairs    <- matrix(NA_real_, nrow = 2, ncol = nploci)
  for (i in seq(nploci)) {
    if ((nploci * sample + i) %% p$step == 0) p$rog() 
    the_pair <- loci_pairs[, i, drop = TRUE]
    newV <- V[, the_pair, drop = FALSE]
    ia_pairs[, i] <- ia_from_d_and_D(
      V = list(
        d.vector  = colSums(newV), 
        d2.vector = colSums(newV * newV), 
        D.vector  = rowSums(newV)
      ),
      np = np
    )
  }
  colnames(ia_pairs) <- apply(loci_pairs, 2, paste, collapse = ":")
  rownames(ia_pairs) <- c("Ia", "rbarD")
  ia_pairs           <- t(ia_pairs)
  ia_pairs
}

#' Create a table of summary statistics per locus. 
#' 
#' @param x a [adegenet::genind-class] or [genclone-class]
#'   object.
#' 
#' @param index Which diversity index to use. Choices are 
#' 
#'  * `"simpson"` (Default) to give Simpson's index 
#'  * `"shannon"` to give the Shannon-Wiener index 
#'  * `"invsimpson"` to give the Inverse Simpson's index aka the Stoddard and
#'    Tayor index.
#' @param lev At what level do you want to analyze diversity? Choices are
#'   `"allele"` (Default) or `"genotype"`.
#' @param population Select the populations to be analyzed. This is the
#'   parameter `sublist` passed on to the function [popsub()].
#'   Defaults to `"ALL"`.
#' @param information When `TRUE` (Default), this will print out a header
#'   of information to the R console.
#' @return a table with 4 columns indicating the Number of alleles/genotypes 
#'   observed, Diversity index chosen, Nei's 1978 gene diversity (expected
#'   heterozygosity), and Evenness.
#' @seealso [vegan::diversity()], [poppr()]
#' @md
#'
#' @note The calculation of `Hexp` is \eqn{(\frac{n}{n-1}) 1 - \sum_{i =
#' 1}^k{p^{2}_{i}}}{(n/(n - 1))*(1 - sum(p^2))} where p is the allele
#' frequencies at a given locus and n is the number of observed alleles (Nei,
#' 1978) in each locus and then returning the average. Caution should be
#' exercised in interpreting the results of Hexp with polyploid organisms with
#' ambiguous ploidy. The lack of allelic dosage information will cause rare
#' alleles to be over-represented and artificially inflate the index. This is
#' especially true with small sample sizes.
#'
#' If `lev = "genotype"`, then all statistics reflect **genotypic** diversity
#' within each locus. This includes the calculation for `Hexp`, which turns
#' into the unbiased Simpson's index.
#'
#' @author Zhian N. Kamvar
#' 
#' @references
#'   Jari Oksanen, F. Guillaume Blanchet, Roeland Kindt, Pierre Legendre, Peter 
#'   R. Minchin, R. B. O'Hara, Gavin L. Simpson, Peter Solymos, M. Henry H. 
#'   Stevens, and Helene Wagner. vegan: Community Ecology Package, 2012. R 
#'   package version 2.0-5.
#' 
#'   Niklaus J. Gr\"unwald, Stephen B. Goodwin, Michael G. Milgroom, and William
#'   E. Fry. Analysis of genotypic diversity data for populations of 
#'   microorganisms. Phytopathology, 93(6):738-46, 2003
#'
#'   J.A. Ludwig and J.F. Reynolds. Statistical Ecology. A Primer on Methods and
#'   Computing. New York USA: John Wiley and Sons, 1988.
#'
#'   E.C. Pielou. Ecological Diversity. Wiley, 1975.
#'
#'   J.A. Stoddart and J.F. Taylor. Genotypic diversity: estimation and
#'   prediction in samples. Genetics, 118(4):705-11, 1988.
#'
#'   Masatoshi Nei. Estimation of average heterozygosity and genetic distance 
#'   from a small number of individuals. Genetics, 89(3):583-590, 1978.
#' 
#'   Claude Elwood Shannon. A mathematical theory of communication. Bell Systems
#'   Technical Journal, 27:379-423,623-656, 1948
#'
#' @export
#' @examples
#' 
#' data(nancycats)
#' locus_table(nancycats[pop = 5])
#' \dontrun{
#' # Analyze locus statistics for the North American population of P. infestans.
#' # Note that due to the unknown dosage of alleles, many of these statistics
#' # will be artificially inflated for polyploids.
#' data(Pinf)
#' locus_table(Pinf, population = "North America")
#' }

locus_table <- function(x, index = "simpson", lev = "allele", 
                        population = "ALL", information = TRUE){
  ploid   <- unique(ploidy(x))
  type    <- x@type
  INDICES <- c("shannon", "simpson", "invsimpson")
  index   <- match.arg(index, INDICES)
  x       <- popsub(x, population, drop = FALSE)
  x.loc   <- summary(as.loci(x))
  outmat  <- vapply(x.loc, locus_table_pegas, numeric(4), index, lev, ploid, type)
  loci    <- colnames(outmat)
  divs    <- rownames(outmat)
  res     <- matrix(0.0, nrow = ncol(outmat) + 1, ncol = nrow(outmat))
  dimlist <- list(`locus` = c(loci, "mean"), `summary` = divs)
  res[-nrow(res), ]     <- t(outmat)
  res[nrow(res), ]      <- colMeans(res[-nrow(res), ], na.rm = TRUE)
  attr(res, "dimnames") <- dimlist
  if (information){
    if (index == "simpson"){
      msg <- "Simpson index"
    } else if (index == "shannon"){
      msg <- "Shannon-Wiener index"
    } else {
      msg <- "Stoddard and Taylor index"
    }
    message("\n", divs[1], " = Number of observed ", paste0(divs[1], "s"), appendLF = FALSE)
    message("\n", divs[2], " = ", msg, appendLF = FALSE)
    message("\n", divs[3], " = Nei's 1978 gene diversity\n", appendLF = FALSE)
    message("------------------------------------------\n", appendLF = FALSE)
  }
  class(res) <- c("locustable", "matrix")
  return(res)
}


#' Tabulate alleles the occur in only one population. 
#' 
#' @param gid a [adegenet::genind-class] or [genclone-class]
#'   object.
#'
#' @param form a [formula()] giving the levels of markers and 
#'   hierarchy to analyze. See Details.
#'
#' @param report one of `"table", "vector",` or `"data.frame"`. Tables
#'   (Default) and data frame will report counts along with populations or 
#'   individuals. Vectors will simply report which populations or individuals 
#'   contain private alleles. Tables are matrices with populations or 
#'   individuals in rows and alleles in columns. Data frames are long form.
#'
#' @param level one of `"population"` (Default) or `"individual"`.
#'
#' @param count.alleles `logical`. If `TRUE` (Default), The report 
#'   will return the observed number of alleles private to each population. If 
#'   `FALSE`, each private allele will be counted once, regardless of 
#'   dosage.
#' 
#' @param drop `logical`. if `TRUE`, populations/individuals without 
#'   private alleles will be dropped from the result. Defaults to `FALSE`.
#'
#' @return a matrix, data.frame, or vector defining the populations or
#'   individuals containing private alleles. If vector is chosen, alleles are
#'   not defined.
#'
#' @details the argument `form` allows for control over the strata at which
#'   private alleles should be computed. It takes a form where the left hand
#'   side of the formula can be either "allele", "locus", or "loci". The right
#'   hand of the equation, by default is ".". If you change it, it must
#'   correspond to strata located in the [adegenet::strata()] slot.  
#'   Note, that the right hand side is disabled for genpop objects.
#' 
#' @export
#' @author Zhian N. Kamvar
#' @md
#' @examples
#' 
#' data(Pinf) # Load P. infestans data.
#' private_alleles(Pinf)
#' 
#' \dontrun{
#' # Analyze private alleles based on the country of interest:
#' private_alleles(Pinf, alleles ~ Country)
#' 
#' # Number of observed alleles per locus
#' private_alleles(Pinf, locus ~ Country, count.alleles = TRUE)
#' 
#' # Get raw number of private alleles per locus.
#' (pal <- private_alleles(Pinf, locus ~ Country, count.alleles = FALSE))
#' 
#' # Get percentages.
#' sweep(pal, 2, nAll(Pinf)[colnames(pal)], FUN = "/")
#' 
#' # An example of how these data can be displayed.
#' library("ggplot2")
#' Pinfpriv <- private_alleles(Pinf, report = "data.frame")
#' ggplot(Pinfpriv) + geom_tile(aes(x = population, y = allele, fill = count))
#' }
private_alleles <- function(gid, form = alleles ~ ., report = "table", 
                            level = "population", count.alleles = TRUE,
                            drop = FALSE){
  REPORTARGS <- c("table", "vector", "data.frame")
  LEVELARGS  <- c("individual", "population")
  LHS_ARGS <- c("alleles", "locus", "loci")
  showform <- utils::capture.output(print(form))
  marker <- pmatch(as.character(form[[2]]), LHS_ARGS, nomatch = 0L, 
                   duplicates.ok = FALSE)
  if (all(marker == 0L)){
    stop("Left hand side of ", showform, " must be one of:\n ",
         paste(LHS_ARGS, collapse = " "))
  } else {
    marker <- LHS_ARGS[marker]
  }
  strataform <- form[c(1, 3)]
  the_strata <- all.vars(strataform[[2]])
  if (length(the_strata) > 1 || the_strata[1] != "."){
    if (!is.genpop(gid)){
      setPop(gid) <- strataform
    } else {
      warning("cannot set strata for a genpop object.")
    }
  } 
  report <- match.arg(report, REPORTARGS)
  level  <- match.arg(level, LEVELARGS)
  if (!is.genind(gid) & !is.genpop(gid)){
    stop(paste(gid, "is not a genind or genpop object."))
  }
  if (is.genind(gid) && !is.null(pop(gid)) | is.genpop(gid) && nPop(gid) > 1){
    if (is.genind(gid)){
      gid.pop <- tab(genind2genpop(gid, quiet = TRUE))
    } else {
      gid.pop <- tab(gid)
    }
    private_columns <- colSums(ifelse(gid.pop > 0, 1, 0), na.rm = TRUE) < 2
    privates <- gid.pop[, private_columns, drop = FALSE]
    if (level == "individual" & is.genind(gid)){
      gid.tab  <- tab(gid)
      privates <- gid.tab[, private_columns, drop = FALSE]
    } else if (!count.alleles){
      privates <- ifelse(privates > 0, 1, 0)
    }
    
    if (drop){
      privates <- privates[rowSums(privates, na.rm = TRUE) > 0, , drop = FALSE]
    }
    
    if (marker != "alleles"){
      private_fac <- locFac(gid)[private_columns]
      privates <- vapply(unique(private_fac), function(l){
        rowSums(privates[, private_fac == l, drop = FALSE], na.rm = TRUE)
      }, FUN.VALUE = numeric(nrow(privates))
      )
      colnames(privates) <- locNames(gid)[unique(private_fac)]
    }
    if (length(privates) == 0){
      privates <- NULL
      cat("No private alleles detected.")
      return(invisible(NULL))
    }
    if (report == "vector"){
      privates <- rownames(privates)
    } else if (report == "data.frame"){
      marker   <- if (marker == "alleles") "allele" else "locus"
      names(dimnames(privates)) <- c(level, marker)
      privates <- as.data.frame.table(privates, 
                                      responseName = "count",
                                      stringsAsFactors = FALSE)
    }
    return(privates)
  } else {
    stop("There are no populations detected")
  }
}
grunwaldlab/poppr documentation built on March 18, 2024, 11:24 p.m.
rdrr.io home R language documentation Run R code online
CRAN packages Bioconductor packages R-Forge packages GitHub packages
Note that we can't provide technical support on individual packages. You should contact the package authors for that.
grunwaldlab/poppr
Genetic Analysis of Populations with Mixed Reproduction

R/Index_calculations.r
In grunwaldlab/poppr: Genetic Analysis of Populations with Mixed Reproduction

Defines functions private_alleles locus_table pair_ia_internal pair.ia ia poppr.all poppr

Documented in ia locus_table pair.ia poppr poppr.all private_alleles

R Package Documentation

Browse R Packages

We want your feedback!

grunwaldlab/poppr Genetic Analysis of Populations with Mixed Reproduction

R/Index_calculations.r In grunwaldlab/poppr: Genetic Analysis of Populations with Mixed Reproduction

Defines functions private_alleles locus_table pair_ia_internal pair.ia ia poppr.all poppr

Documented in ia locus_table pair.ia poppr poppr.all private_alleles

R Package Documentation

Browse R Packages

We want your feedback!

grunwaldlab/poppr
Genetic Analysis of Populations with Mixed Reproduction

R/Index_calculations.r
In grunwaldlab/poppr: Genetic Analysis of Populations with Mixed Reproduction