R/RcppExports.R
In poolfstat: Computing f-Statistics and Building Admixture Graphs Based on Allele Count or Pool-Seq Read Count Data

Documented in .block_sum .block_sum2 .compute_blockDdenom .compute_F3fromF2 .compute_F3fromF2samples .compute_F4DfromF2samples .compute_F4fromF2 .compute_F4fromF2samples .compute_H1 .compute_Q2 .compute_QmatfromF2samples .compute_snpFstAov .compute_snpHierFstAov .compute_snpQ1 .compute_snpQ1onepop .compute_snpQ1rw .compute_snpQ2 .compute_snpQ2onepair .compute_snpQ2rw .extract_allele_names .extract_nonvscan_counts .extract_vscan_counts .find_indelneighbor_idx .generateF3names .generateF4names .scan_allele_info .simureads_mono .simureads_poly

# Generated by using Rcpp::compileAttributes() -> do not edit by hand
# Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393

#' @title scan_allele_info
#' @name scan_allele_info
#' @rdname scan_allele_info
#'
#' @description
#' Scan allele information in ALT field of a vcf
#'
#' @param allele_info a character string vector (ALT field of the vcf)
#'
#' @details
#' Scan allele information in ALT field of a vcf to identify the number of alleles and if there is indels
#'
#' @return Return a vector with two elements consisting i) the number of alleles (1+number of comma)
#' and ii) 0 or 1 if an indel is detected 
#' 
#' @examples
#' .scan_allele_info(c("A,C","T","AAT"))
#' 
#' @export
.scan_allele_info <- function(allele_info) {
    .Call('_poolfstat_scan_allele_info', PACKAGE = 'poolfstat', allele_info)
}

#' @title extract_vscan_counts
#' @name extract_vscan_counts
#' @rdname extract_vscan_counts
#'
#' @description
#' Extract VarScan counts
#'
#' @param vcf_data a matrix of String containing count information in VarScan format
#' @param ad_idx the index of the FORMAT AD field  
#' @param rd_idx the index of the FORMAT RD field 
#'
#' @details Extract VarScan counts and return read counts for the reference and alternate allele.
#' For VarScan generated vcf, SNPs with more than one alternate allele are discarded 
#' (because only a single count is then reported in the AD fields) making the min.rc unavailable (of vcf2pooldata).
#' The VarScan --min-reads2 option might replace to some extent the min.rc functionality although 
#' SNP where the two major alleles in the Pool-Seq data are different from the reference allele 
#' (e.g., expected to be more frequent when using a distantly related reference genome for mapping) 
#' will be disregarded.
#' @return A numeric matrix of read count with nsnp rows and 2*npools columns.
#' The first npools columns consist of read count for the reference allele (RD),
#' columns npools+1 to 2*npools consist of read coverage (RD+AD)
#' @examples
#' .extract_vscan_counts(rbind(c("0/0:0:20","1/1:18:1"),c("0/1:12:15","1/1:27:2")),3,2)
#' 
#' @export
.extract_vscan_counts <- function(vcf_data, ad_idx, rd_idx) {
    .Call('_poolfstat_extract_vscan_counts', PACKAGE = 'poolfstat', vcf_data, ad_idx, rd_idx)
}

#' @title extract_nonvscan_counts
#' @name extract_nonvscan_counts
#' @rdname extract_nonvscan_counts
#'
#' @description
#' Extract counts from vcf produced by other caller than VarScan (e.g., bcftools, FreeBayes, GATK)
#'
#' @param vcf_data a matrix of String containing count information
#' @param nb_all a vector containing the number of alleles for the different markers
#' @param ad_idx the index of the FORMAT AD field  
#' @param min_rc Minimal allowed read count per base (same as min.rc option in \code{\link{vcf2pooldata}}) 
#'
#' @details Extract VarScan counts and return read counts for the reference and alternate allele
#' @return A numeric matrix of read count with nsnp rows and 2*npools+6 columns.
#' The first npools columns consist of read count for the reference allele,
#' columns npools+1 to 2*npools consist of read coverage. The last 6 columns correspond to 
#' the index of the two most frequent alleles (idx_all1 and idx_all2) and their count (cnt_all1 and cnt_all2);
#' the min_rc filtering criterion and count of variant (cnt_bases) other than two first most frequent. The min_rc crit is
#' set to -1 for polymorphisms with more than 2 alleles and with the third most frequent alleles having 
#' more than min_rc count 
#' @examples
#' .extract_nonvscan_counts(rbind(c("0/0:20,0","1/1:1,18"),c("0/2:12,1,15","1/1:27,1,0")),c(2,3),2,0)
#' .extract_nonvscan_counts(rbind(c("0/0:20,0","1/1:1,18"),c("0/2:12,1,15","1/1:27,1,0")),c(2,3),2,2)
#' @export
.extract_nonvscan_counts <- function(vcf_data, nb_all, ad_idx, min_rc) {
    .Call('_poolfstat_extract_nonvscan_counts', PACKAGE = 'poolfstat', vcf_data, nb_all, ad_idx, min_rc)
}

#' @title extract_allele_names
#' @name extract_allele_names
#' @rdname extract_allele_names
#'
#' @description
#' Extract the alleles from the REF and ALT fields
#'
#' @param allele_info a character string vector (concatenated REF and ALT field of the vcf)
#' @param allele_idx Matrix with indexes of the two alleles of interest for the different markers
#'
#' @details
#' Extract the alleles from the REF and ALT fields
#' 
#' @return Return a matrix with the two alleles after parsing the alleles info
#' 
#' @examples
#' .extract_allele_names(c("A,C","A,C,T"),rbind(c(1,2),c(1,3)))
#' 
#' @export
.extract_allele_names <- function(allele_info, allele_idx) {
    .Call('_poolfstat_extract_allele_names', PACKAGE = 'poolfstat', allele_info, allele_idx)
}

#' @title find_indelneighbor_idx
#' @name find_indelneighbor_idx
#' @rdname find_indelneighbor_idx
#'
#' @description
#' Search for the closest indels of the markers
#'
#' @param contig a character string vector corresponding to the CHR field value of the vcf for the markers
#' @param position an integer vector corresponding to the POSITION value for the markers 
#' @param indels_idx vector of (0-indexed) indices of indels
#' @param min_dist same as min.dist.from.indels option in \code{\link{vcf2pooldata}}
#' @param indels_size size of the indels (associated to indels_idx)
#'
#' @details
#' Identify if the SNPs are close to an indel
#' 
#' @return Return a vector consisting of 1 (if the marker is close to an indel) or 0 (if not)
#' 
#' @examples
#' .find_indelneighbor_idx(c("chr1","chr1","chr1"),c(1000,1004,1020),1,5,2)
#' 
#' @export
.find_indelneighbor_idx <- function(contig, position, indels_idx, min_dist, indels_size) {
    .Call('_poolfstat_find_indelneighbor_idx', PACKAGE = 'poolfstat', contig, position, indels_idx, min_dist, indels_size)
}

#' @title compute_snpQ1
#' @name compute_snpQ1
#' @rdname compute_snpQ1
#'
#' @description
#' Compute SNP-specific Q1 by averaging over all samples
#'
#' @param refcount Matrix of nsnpxnpop with counts (genotype or reads) for the reference allele
#' @param totcount Matrix of nsnpxnpop with total counts or read coverages
#' @param weight Vector of length npop giving the weighting scheme (w=1 for allele count data and w=poolsize/(poolsize-1) for PoolSeq data)
#' @param verbose Logical (if TRUE progression bar is printed on the terminal)
#'
#' @details
#' Compute all the SNP-specific Q1 over all pop. samples (useful for Fst computation with method Identity). 
#' 
#' @return Return a vector of length nsnps with SNP-specific Q1
#' 
#' @examples
#' #
#' @export
.compute_snpQ1 <- function(refcount, totcount, weight, verbose) {
    .Call('_poolfstat_compute_snpQ1', PACKAGE = 'poolfstat', refcount, totcount, weight, verbose)
}

#' @title compute_snpQ1rw
#' @name compute_snpQ1rw
#' @rdname compute_snpQ1rw
#'
#' @description
#' Compute SNP-specific Q1 over all samples using weighting averages of pop. Q1 (eq. A46 in Hivert et al., 2018)
#'
#' @param refcount Matrix of nsnpxnpop with counts (genotype or reads) for the reference allele
#' @param totcount Matrix of nsnpxnpop with total counts or read coverages
#' @param weight Vector of length npop giving the weighting scheme (w=1 for allele count data and w=poolsize/(poolsize-1) for PoolSeq data)
#' @param sampsize Vector of length npop giving the haploid sample size (not used for count data)
#' @param readcount Logical (if TRUE PoolSeq data assumed i.e. weights depending on haploid size, otherwise weights depend on total counts)
#' @param verbose Logical (if TRUE progression bar is printed on the terminal)
#'
#' @details
#' Compute all the SNP-specific Q1 over all pop. samples using weighting averages of pop. Q1 as in eq. A46 of Hivert et al., 2018 (useful for Fst computation with method Identity). 
#' 
#' @return Return a vector of length nsnps with SNP-specific Q1
#' 
#' @examples
#' #
#' @export
.compute_snpQ1rw <- function(refcount, totcount, weight, sampsize, readcount, verbose) {
    .Call('_poolfstat_compute_snpQ1rw', PACKAGE = 'poolfstat', refcount, totcount, weight, sampsize, readcount, verbose)
}

#' @title compute_snpQ2
#' @name compute_snpQ2
#' @rdname compute_snpQ2
#'
#' @description
#' Compute SNP-specific Q2 by averaging over all pairs of samples
#'
#' @param refcount Matrix of nsnpxnpop with counts (genotype or reads) for the reference allele
#' @param totcount Matrix of nsnpxnpop with total counts or read coverages
#' @param pairs Matrix of npoppairsx2 giving the index for all the pairs of pops included in the computation
#' @param verbose Logical (if TRUE progression bar is printed on the terminal)
#'
#' @details
#' Compute all the SNP-specific Q2 over all pop. pairs (useful for Fst computation with method Identity). 
#' 
#' @return Return a vector of length nsnps with SNP-specific Q2
#' 
#' @examples
#' #
#' @export
.compute_snpQ2 <- function(refcount, totcount, pairs, verbose) {
    .Call('_poolfstat_compute_snpQ2', PACKAGE = 'poolfstat', refcount, totcount, pairs, verbose)
}

#' @title compute_snpQ2rw
#' @name compute_snpQ2rw
#' @rdname compute_snpQ2w
#'
#' @description
#' Compute SNP-specific Q2 by averaging over all pairs of samples using weighting averages of pairwise Q2 (eq. A47 in Hivert et al., 2018)
#'
#' @param refcount Matrix of nsnpxnpop with counts (genotype or reads) for the reference allele
#' @param totcount Matrix of nsnpxnpop with total counts or read coverages
#' @param pairs Matrix of npoppairsx2 giving the index for all the pairs of pops included in the computation
#' @param sampsize Vector of length npop giving the haploid sample size (not used for count data)
#' @param readcount Logical (if TRUE PoolSeq data assumed i.e. weights depending on haploid size, otherwise weights depend on total counts)
#' @param verbose Logical (if TRUE progression bar is printed on the terminal)
#'
#' @details
#' Compute SNP-specific Q2 by averaging over all pairs of samples using weighting averages of pairwise Q2 (eq. A47 in Hivert et al., 2018)
#' (useful for Fst computation with method Identity). 
#' 
#' @return Return a vector of length nsnps with SNP-specific Q2
#' 
#' @examples
#' #
#' @export
.compute_snpQ2rw <- function(refcount, totcount, pairs, sampsize, readcount, verbose) {
    .Call('_poolfstat_compute_snpQ2rw', PACKAGE = 'poolfstat', refcount, totcount, pairs, sampsize, readcount, verbose)
}

#' @title compute_snpHierFstAov
#' @name compute_snpHierFstAov
#' @rdname compute_snpHierFstAov
#'
#' @description
#' Compute SNP-specific MSI, MSP, MSG, nc, nc_p and nc_pp used to derived the Anova estimator of hier. Fst for allele count or read count data (Pool-Seq)
#'
#' @param refcount Matrix of nsnpxnpop with counts (genotype or reads) for the reference allele
#' @param totcount Matrix of nsnpxnpop with total counts or read coverages
#' @param hapsize Vector of length npop giving the haploid size of each pool (if one element <=0, counts are interpreted as count data)
#' @param popgrpidx Vector of length npop giving the index (coded from 0 to ngrp-1) of the group of origin
#' @param verbose Logical (if TRUE progression bar is printed on the terminal)
#'
#' @details
#' Compute SNP-specific MSI, MSP, MSG, nc, nc_p and nc_pp used to derived the Anova estimator of hier. Fst for allele count or read count data (Pool-Seq)
#' 
#' @return Return a nsnpsx6 matrix with SNP-specific MSI, MSP, MSG, nc, nc_p and nc_pp
#' 
#' @examples
#' #
#' @export
.compute_snpHierFstAov <- function(refcount, totcount, hapsize, popgrpidx, verbose) {
    .Call('_poolfstat_compute_snpHierFstAov', PACKAGE = 'poolfstat', refcount, totcount, hapsize, popgrpidx, verbose)
}

#' @title compute_snpFstAov
#' @name compute_snpFstAov
#' @rdname compute_snpFstAov
#'
#' @description
#' Compute SNP-specific MSG, MSP and nc used to derived the Anova estimator of Fst for allele count or read count data (Pool-Seq)
#'
#' @param refcount Matrix of nsnpxnpop with counts (genotype or reads) for the reference allele
#' @param totcount Matrix of nsnpxnpop with total counts or read coverages
#' @param hapsize Vector of length npop giving the haploid size of each pool (if one element <=0, counts are interpreted as count data)
#' @param verbose Logical (if TRUE progression bar is printed on the terminal)
#'
#' @details
#' Compute SNP-specific Q1 and Q2 based on Anova estimator of Fst for allele count or read count data (Pool-Seq).
#' For allele count data, the implemented estimator corresponds to that described in Weir, 1996 (eq. 5.2)  
#' For read (Pool-Seq) data, the implemented estimator corresponds to that described in Hivert et al., 2016  
#' 
#' @return Return a nsnpsx3 matrix with SNP-specific MSG, MSP and nc
#' 
#' @examples
#' #
#' @export
.compute_snpFstAov <- function(refcount, totcount, hapsize, verbose) {
    .Call('_poolfstat_compute_snpFstAov', PACKAGE = 'poolfstat', refcount, totcount, hapsize, verbose)
}

#' @title block_sum
#' @name block_sum
#' @rdname block_sum
#'
#' @description
#' Sugar to compute the sum of a stat per block
#'
#' @param stat vector of n stat values
#' @param snp_bj_id integer n-length vector with block index (from 0 to nblock-1) of the stat value 
#'
#' @details
#'  Sugar to compute the sum of a stat per block
#' 
#' @return Return a vector of length nblocks containing the per-block sums of the input stat
#' 
#' @examples
#' #
#' @export
.block_sum <- function(stat, snp_bj_id) {
    .Call('_poolfstat_block_sum', PACKAGE = 'poolfstat', stat, snp_bj_id)
}

#' @title block_sum2
#' @name block_sum2
#' @rdname block_sum2
#'
#' @description
#' Sugar to compute the sum of a stat per block defined by a range of SNPs (allow treating overlapping blocks)
#'
#' @param stat vector of n stat values
#' @param snp_bj_id integer matrix of dim nblocks x 2 giving for each block the start and end stat value index 
#'
#' @details
#'  Sugar to compute the sum of a stat per block defined by a range of SNPs (allow treating overlapping blocks)
#' 
#' @return Return a vector of length nblocks containing the per-block sums of the input stat
#' 
#' @examples
#' #
#' @export
.block_sum2 <- function(stat, snp_bj_id) {
    .Call('_poolfstat_block_sum2', PACKAGE = 'poolfstat', stat, snp_bj_id)
}

#' @title poppair_idx
#' @name poppair_idx
#' @rdname poppair_idx
#'
#' @description
#' Compute the index of the pairwise comparison from the idx of each pop
#'
#' @param idx_pop1 Integer giving the (0-indexed) index of the first pop 
#' @param idx_pop2 Integer giving the (0-indexed) index of the second pop 
#' @param nidx Integer giving the total number of indexes (i.e., number of pops)
#'
#' @details
#' If idx_pop2 < idx_pop1, indexes are reversed
#' 
#' @return Return the (0-indexed) index for the row associated to the pairwise comparison in the ordered flat list of all (npop*(npop-1))/2 pairwise stats
#' 
#' @examples
#' #
NULL

#' @title bjack_cov
#' @name bjack_cov
#' @rdname bjack_cov
#'
#' @description
#' Compute the block-jackknife covariance between two stats
#'
#' @param stat1 Vector of block-jackknife values for the first stat
#' @param stat2 Vector of block-jackknife values for the second stat
#'
#' @details
#'  Compute the block-jackknife covariance between two stats with correction
#' 
#' @return Covariance values
#' 
#' @examples
#' #
NULL

#' @title compute_H1
#' @name compute_H1
#' @rdname compute_H1
#'
#' @description
#' Compute (uncorrected) 1-Q1 for each block-jackknife block (if any) and over all the SNPs (i.e., either within or outside blocks)
#'
#' @param refcount Matrix of nsnpxnpop with counts (genotype or reads) for the reference allele
#' @param totcount Matrix of nsnpxnpop with total counts or read coverages
#' @param nblocks Integer giving the number of block-jackknife blocs (may be 0 if no block-jackknife)
#' @param block_id Integer vector of length nsnps with the (0-indexed) id of the block to which each SNP belongs (-1 for SNPs outside blocks)
#' @param verbose Logical (if TRUE progression bar is printed on the terminal)
#'
#' @details
#' Compute all the (uncorrected) H1=1-Q1 for each block-jackknife block (if any) and overall SNPs (within or outside blocks). 
#' It is indeed more convenient to compute H1 (rather than Q1) to apply corrections afterwards within R function 
#' 
#' @return Return a matrix with npops rows and nblocks+1 column giving the mean H1 of each pop within each block and for all SNPs (last column)
#' 
#' @examples
#' #
#' @export
.compute_H1 <- function(refcount, totcount, nblocks, block_id, verbose) {
    .Call('_poolfstat_compute_H1', PACKAGE = 'poolfstat', refcount, totcount, nblocks, block_id, verbose)
}

#' @title compute_Q2
#' @name compute_Q2
#' @rdname compute_Q2
#'
#' @description
#' Compute all Q2 for each block-jackknife block (if any) and overall SNPs (within or outside blocks)
#'
#' @param refcount Matrix of nsnpxnpop with counts (genotype or reads) for the reference allele
#' @param totcount Matrix of nsnpxnpop with total counts or read coverages
#' @param nblocks Integer giving the number of block-jackknife blocs (may be 0 if no block-jackknife)
#' @param block_id Integer vector of length nsnps with the (0-indexed) id of the block to which each SNP belongs (-1 for SNPs outside blocks)
#' @param verbose Logical (if TRUE progression bar is printed on the terminal)
#'
#' @details
#' Compute all Q2 for each block-jackknife block (if any) and overall SNPs (within or outside blocks). 
#' 
#' @return Return a matrix with npops*(npops-1)/2 and nblocks+1 column giving the mean Q2 of each pairwise pop comp. within each block and for all SNPs (last column)
#' 
#' @examples
#' #
#' @export
.compute_Q2 <- function(refcount, totcount, nblocks, block_id, verbose) {
    .Call('_poolfstat_compute_Q2', PACKAGE = 'poolfstat', refcount, totcount, nblocks, block_id, verbose)
}

#' @title compute_F3fromF2
#' @name compute_F3fromF2
#' @rdname compute_F3fromF2
#'
#' @description
#' Compute all F3 from overall F2 values
#'
#' @param F2val Numeric vector of length nF2=(npop*(npop-1))/2 with all pairwise F2 estimates
#' @param Hval Numeric vector of length npop with all within pop heterozygosity estimates
#' @param npops Integer giving the number of populations
#'
#' @details
#' Compute F3 and F3star estimates from F2 (and heterozygosities)
#' 
#' @return Return a matrix of length nF3=npops*(npops-1)*(npops-2)/2 rows and 2 columns corresponding to the F3 and F3star estimates
#' 
#' @examples
#' #
#' @export
.compute_F3fromF2 <- function(F2val, Hval, npops) {
    .Call('_poolfstat_compute_F3fromF2', PACKAGE = 'poolfstat', F2val, Hval, npops)
}

#' @title compute_F3fromF2samples
#' @name compute_F3fromF2samples
#' @rdname compute_F3fromF2samples
#'
#' @description
#' Compute all F3 from F2 values obtained from each block-jackknife bloc
#'
#' @param blockF2 Numeric Matrix with nF2=(npop*(npop-1))/2 rows and nblocks columns matrix containing pairwise-pop F2 estimates for each block-jackknife sample (l.o.o.)
#' @param blockHet Numeric Matrix with npop rows and nblocks columns containing all within pop heterozygosity estimates for each block-jackknife sample (l.o.o.)
#' @param npops Integer giving the number of populations
#' @param verbose Logical (if TRUE progression bar is printed on the terminal)
#'
#' @details
#' Compute F3 and F3star estimates and their s.e. based on block-jackknife estimates of all F2 (and heterozygosities)
#' 
#' @return Return a matrix with nF3=npops*(npops-1)*(npops-2)/2 rows and four columns corresponding to the mean and the s.e. of F3 and the mean and s.e. of F3star
#' 
#' @examples
#' #
#' @export
.compute_F3fromF2samples <- function(blockF2, blockHet, npops, verbose) {
    .Call('_poolfstat_compute_F3fromF2samples', PACKAGE = 'poolfstat', blockF2, blockHet, npops, verbose)
}

#' @title generateF3names
#' @name generateF3names
#' @rdname generateF3names
#'
#' @description
#' Generate all names for F3 stats (same order as computation)
#'
#' @param popnames String vector with the names of all the pops
#'
#' @details
#' Generate all the npops*(npops-1)*(npops-2)/2 names for F3 stats (same order as computation)
#' 
#' @return Return a string matrix with 4 columns including the complete F3 configuration names (of the form Px;P1,P2), and the names of each pop involved in the configuration
#' 
#' @examples
#' #
#' @export
.generateF3names <- function(popnames) {
    .Call('_poolfstat_generateF3names', PACKAGE = 'poolfstat', popnames)
}

#' @title compute_F4fromF2
#' @name compute_F4fromF2
#' @rdname compute_F4fromF2
#'
#' @description
#' Compute all F4 from overall F2 and Q2 values
#'
#' @param F2val Numeric vector of length nF2=(npop*(npop-1))/2 with all pairwise F2 estimates
#' @param npops Integer giving the number of populations
#'
#' @details
#' Compute F4 from F2 (and heterozygosities)
#' 
#' @return Return a vector of length nF4=(npops*(npops-1)/2) * ((npops-2)*(npops-3)/2) / 2 rows corresponding to all the F4 estimates for all possible configurations
#' 
#' @examples
#' #
#' @export
.compute_F4fromF2 <- function(F2val, npops) {
    .Call('_poolfstat_compute_F4fromF2', PACKAGE = 'poolfstat', F2val, npops)
}

#' @title compute_F4fromF2samples
#' @name compute_F4fromF2samples
#' @rdname compute_F4fromF2samples
#'
#' @description
#' Compute all F4 from F2 values obtained from each block-jackknife bloc
#'
#' @param blockF2 Numeric Matrix with nF2=(npop*(npop-1))/2 rows and nblocks columns matrix containing pairwise-pop F2 estimates for each block-jackknife sample (l.o.o.)
#' @param npops Integer giving the number of populations
#' @param verbose Logical (if TRUE progression bar is printed on the terminal)
#'
#' @details
#' Compute F4 estimates and their s.e. based on block-jackknife estimates of all F2 (and heterozygosities)
#' 
#' @return Return a matrix with nF4=(npops*(npops-1)/2) * ((npops-2)*(npops-3)/2) / 2 rows and two columns corresponding to the mean and the s.e. of F4 estimates for all possible configurations
#' 
#' @examples
#' #
#' @export
.compute_F4fromF2samples <- function(blockF2, npops, verbose) {
    .Call('_poolfstat_compute_F4fromF2samples', PACKAGE = 'poolfstat', blockF2, npops, verbose)
}

#' @title compute_F4DfromF2samples
#' @name compute_F4DfromF2samples
#' @rdname compute_F4DfromF2samples
#'
#' @description
#' Compute all F4 and Dstat from F2 values obtained from each block-jackknife bloc
#'
#' @param blockF2 Numeric Matrix with nF2=(npop*(npop-1))/2 rows and nblocks columns matrix containing pairwise-pop F2 estimates for each block-jackknife sample (l.o.o.)
#' @param blockDenom Numeric Matrix with nF4=(npops*(npops-1)/2)*((npops-2)*(npops-3)/2)/2 rows and nblocks containing the estimates of the denominator of Dstat (see compute_blockDdenom) for each block-jackknife sample (l.o.o.) 
#' @param npops Integer giving the number of populations
#' @param verbose Logical (if TRUE progression bar is printed on the terminal)
#'
#' @details
#' Compute F4 and D estimates and their s.e. based on block-jackknife estimates of all F2 (and heterozygosities)
#' 
#' @return Return a matrix with nF4=(npops*(npops-1)/2)*((npops-2)*(npops-3)/2)/2 rows and four columns corresponding to the mean and the s.e. of F4 and the mean and s.e. of Dstat
#' 
#' @examples
#' #
#' @export
.compute_F4DfromF2samples <- function(blockF2, blockDenom, npops, verbose) {
    .Call('_poolfstat_compute_F4DfromF2samples', PACKAGE = 'poolfstat', blockF2, blockDenom, npops, verbose)
}

#' @title compute_blockDdenom
#' @name compute_blockDdenom
#' @rdname compute_blockDdenom
#'
#' @description
#' Compute the denominator of the Dstat for all quadruplet configuration and each block-jackknife block (if any) and overall SNPs (within or outside blocks)
#'
#' @param refcount Matrix of nsnpxnpop with counts (genotype or reads) for the reference allele
#' @param totcount Matrix of nsnpxnpop with total counts or read coverages
#' @param nblocks Integer giving the number of block-jackknife blocs (may be 0 if no block-jackknife)
#' @param block_id Integer vector of length nsnps with the (0-indexed) id of the block to which each SNP belongs (-1 for SNPs outside blocks)
#' @param verbose Logical (if TRUE progression bar is printed on the terminal)
#'
#' @details
#' Compute the denominator of the Dstat for all quadruplet configuration and each block-jackknife block (if any) and overall SNPs (within or outside blocks)
#' 
#' @return Return a matrix with nf4=(npops*(npops-1)/2)*((npops-2)*(npops-3)/2)/2 rows and nblocks+1 columns giving the mean Dstat-denominator (1-Q2ab)(1-Q2cd)
#'  for all quadruplet configuration and within each block-jackknife sample and over all SNPs (last column)
#' 
#' @examples
#' #
#' @export
.compute_blockDdenom <- function(refcount, totcount, nblocks, block_id, verbose) {
    .Call('_poolfstat_compute_blockDdenom', PACKAGE = 'poolfstat', refcount, totcount, nblocks, block_id, verbose)
}

#' @title generateF4names
#' @name generateF4names
#' @rdname generateF4names
#'
#' @description
#' Generate all names for F4 stats (same order as computation)
#'
#' @param popnames String vector with the names of all the pops
#'
#' @details
#' Generate all the nf4=(npops*(npops-1)/2)*((npops-2)*(npops-3)/2)/2 names for F4 stats (same order as computation)
#' 
#' @return Return a string matrix with 5 columns including the complete F4 configuration names (of the form P1,P2;P3,P4), and the names of each pop involved in the configuration
#' 
#' #
#' @export
.generateF4names <- function(popnames) {
    .Call('_poolfstat_generateF4names', PACKAGE = 'poolfstat', popnames)
}

#' @title compute_QmatfromF2samples
#' @name compute_QmatfromF2samples
#' @rdname compute_QmatfromF2samples
#'
#' @description
#' Compute the Qmat matrix (error covariance between all F2 and F3 measures) from F2 block-jackknife estimates
#'
#' @param blockF2 Numeric Matrix with nF2=(npop*(npop-1))/2 rows and nblocks columns matrix containing pairwise-pop F2 estimates for each block-jackknife sample (l.o.o.)
#' @param npops Integer giving the number of populations
#' @param verbose Logical (if TRUE progression bar is printed on the terminal)
#'
#' @details
#' Compute the error covariance matrix Qmat (between all F2 and F3 measures) from F2 block-jackknife estimates (by recomuting all F3 for all blocks)
#' 
#' @return Return the (nF2+nF3)*(nF2+nF3) error covariance (symmetric) matrix
#' 
#' @examples
#' #
#' @export
.compute_QmatfromF2samples <- function(blockF2, npops, verbose) {
    .Call('_poolfstat_compute_QmatfromF2samples', PACKAGE = 'poolfstat', blockF2, npops, verbose)
}

#' @title compute_snpQ1onepop
#' @name compute_snpQ1onepop
#' @rdname compute_snpQ1onepop
#'
#' @description
#' Compute SNP-specific Q1 for one pop
#'
#' @param refcount Vector of nsnp counts (genotype or reads) for the reference allele
#' @param totcount Vector of nsnp total counts or read coverages
#' @param weight Numeric (w=1 for allele count data and w=poolsize/(poolsize-1) for PoolSeq data)
#'
#' @details
#' Compute SNP-specific Q1 for one pop. samples. 
#' 
#' @return Return a vector of length nsnps with SNP-specific Q1
#' 
#' @examples
#' #
#' @export
.compute_snpQ1onepop <- function(refcount, totcount, weight) {
    .Call('_poolfstat_compute_snpQ1onepop', PACKAGE = 'poolfstat', refcount, totcount, weight)
}

#' @title compute_snpQ2onepair
#' @name compute_snpQ2onepair
#' @rdname compute_snpQ2onepair
#'
#' @description
#' Compute SNP-specific Q2 for a single pair of samples
#'
#' @param refcount1 Vector of count (genotype or reads) for the reference allele in the first sample
#' @param refcount2 Vector of count (genotype or reads) for the reference allele in the second sample
#' @param totcount1 Vector of total count or read coverages in the first sample
#' @param totcount2 Vector of total count or read coverages in the second sample
#'
#' @details
#' Compute SNP-specific Q2 for a single pair of samples 
#' 
#' @return Return a vector of length nsnps with SNP-specific Q1
#' 
#' @examples
#' #
#' @export
.compute_snpQ2onepair <- function(refcount1, refcount2, totcount1, totcount2) {
    .Call('_poolfstat_compute_snpQ2onepair', PACKAGE = 'poolfstat', refcount1, refcount2, totcount1, totcount2)
}

#' @title simureads_poly
#' @name simureads_poly
#' @rdname simureads_poly
#'
#' @description
#' Simulate read counts from count data
#'
#' @param y_count Integer Matrix with nsnp rows and npop columns giving allele counts at the reference allele
#' @param n_count Integer Matrix with nsnp rows and npop columns giving total counts
#' @param lambda Numeric Vector of length npop giving the expected coverage of each pool
#' @param overdisp Numeric value giving overdispersion of coverages and their distribution (see details)
#' @param min_rc Integer giving the minimal read count for an allele to be considered as true allele
#' @param min_maf Float giving the MAF threshold for SNP filtering
#' @param eps Numeric value giving the sequencing error
#' @param eps_exp Numeric value giving the experimental error leading to unequal contribution of individual to the pool reads
#' @details
#'  The function implements a simulation approach similar to that described in Gautier et al. (2021). Read coverages are sampled
#'  from a distribution specified by the lambda and overdisp vectors. Note that overdisp is the same for all pop sample but 
#'  lambda (expected coverages) may vary across pool. If overdisp=1 (default in the R function), coverages are assumed Poisson distributed
#'  and the mean and variance of the coverages for the pool are both equal to the value specified in the lambda vector. If overdisp>1, coverages
#'  follows a Negative Binomial distribution with a mean equal the lamda but a variance equal to overdisp*lambda. Finally, if overdisp<1,
#'  no variation in coverage is introduced and all coverages are equal to the value specified in the lambda vector 
#'  although they may (slightly) vary in the output when eps>0 due to the removal of error reads.
#'  The eps parameter control sequencing error rate. Sequencing errors are modeled following Gautier et al. (2021) i.e. read counts for the four
#'  possible bases are sampled from a multinomial distribution Multinom(c,\{f*(1-eps)+(1-f)*eps/3;f*eps/3+(1-f)*(1-eps),eps/3,eps/3\}) 
#'  where c is the read coverage and f the reference allele frequencies (obtained from the count data).
#'  Experimental error eps_exp control the contribution of individual (assumed diploid) to the pools following the model described 
#'  in Gautier et al. (2013).  The parameter eps_exp corresponds to the coefficient of variation of the individual contributions
#'  When eps_exp tends toward 0, all individuals contribute equally to the pool and there is no experimental error. For example, 
#'  with 10 individuals, eps_exp=0.5 correspond to a situation where 5 individuals contribute 2.8x more reads than the five others.
#'  Note that the number of (diploid) individuals for each SNP and pop. sample is deduced from the input total count 
#'  (it may thus differ over SNP when the total counts are not the same). 
#'  
#' @return Return an Integer matrix with nsnp rows and 2*npop columns (1:npop=ref allele readcount; (npop+1):2*npop=coverage)  
#' 
#' @examples
#' #
#' @export
.simureads_poly <- function(y_count, n_count, lambda, overdisp, min_rc, min_maf, eps, eps_exp) {
    .Call('_poolfstat_simureads_poly', PACKAGE = 'poolfstat', y_count, n_count, lambda, overdisp, min_rc, min_maf, eps, eps_exp)
}

#' @title simureads_mono
#' @name simureads_mono
#' @rdname simureads_mono
#'
#' @description
#' Simulate read counts for monomorphic position when there is sequencing error
#'
#' @param npos Integer giving the number of positions (close to genome size)
#' @param npop Integer giving the number of population samples
#' @param lambda Numeric Vector of length npop giving the expected coverage of each pool
#' @param overdisp Numeric value giving overdispersion of coverages and their distribution (see details)
#' @param min_rc Integer giving the minimal read count for an allele to be considered as true allele
#' @param min_maf Float giving the MAF threshold for SNP filtering
#' @param eps Numeric value giving the sequencing error
#' @details
#' The function implements a simulation approach similar to that described in Gautier et al. (2021). Read coverages are sampled
#' from a distribution specified by the lambda and overdisp vectors. Note that overdisp is the same for all pop sample but 
#' lambda (expected coverages) may vary across pool. If overdisp=1 (default in the R function), coverages are assumed Poisson distributed
#' and the mean and variance of the coverages for the pool are both equal to the value specified in the lambda vector. If overdisp>1, coverages
#' follows a Negative Binomial distribution with a mean equal the lamda but a variance equal to overdisp*lambda. Finally, if overdisp<1,
#' no variation in coverage is introduced and all coverages are equal to the value specified in the lambda vector 
#' although they may (slightly) vary in the output when eps>0 due to the removal of error reads.
#' The eps parameter control sequencing error rate. Sequencing errors are modeled following Gautier et al. (2021) i.e. read counts for the four
#' possible bases are sampled from a multinomial distribution Multinom(c,\{1-eps;eps/3,eps/3,eps/3\}) 
#' where c is the read coverage. Only bi-allelic SNPs (after considering min_rc) satisfying with MAF>min_maf are included in the output.
#'  
#' @return Return an Integer matrix with nsnp rows and 2*npop columns (1:npop=ref allele readcount; (npop+1):2*npop=coverage)  
#' 
#' @examples
#' #
#' @export
.simureads_mono <- function(npos, npop, lambda, overdisp, min_rc, min_maf, eps) {
    .Call('_poolfstat_simureads_mono', PACKAGE = 'poolfstat', npos, npop, lambda, overdisp, min_rc, min_maf, eps)
}
Any scripts or data that you put into this service are public.
poolfstat documentation built on April 4, 2025, 1:49 a.m.
rdrr.io home R language documentation Run R code online
CRAN packages Bioconductor packages R-Forge packages GitHub packages
Note that we can't provide technical support on individual packages. You should contact the package authors for that.
poolfstat
Computing f-Statistics and Building Admixture Graphs Based on Allele Count or Pool-Seq Read Count Data

R/RcppExports.R
In poolfstat: Computing f-Statistics and Building Admixture Graphs Based on Allele Count or Pool-Seq Read Count Data

Try the poolfstat package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

poolfstat Computing f-Statistics and Building Admixture Graphs Based on Allele Count or Pool-Seq Read Count Data

R/RcppExports.R In poolfstat: Computing f-Statistics and Building Admixture Graphs Based on Allele Count or Pool-Seq Read Count Data

Try the poolfstat package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

poolfstat
Computing f-Statistics and Building Admixture Graphs Based on Allele Count or Pool-Seq Read Count Data

R/RcppExports.R
In poolfstat: Computing f-Statistics and Building Admixture Graphs Based on Allele Count or Pool-Seq Read Count Data