plot_pairwise_fst_heatmap()
when runing multiple facets without facet orders or when significance had been calcualted for some but not all facets.plot_pairwise_fst_heatmap()
where plotting was not always strictly on the upper diagonal.mean_fst
's when some SNPs returned NA.calc_ne()
with a space in the NeEstimator_path
variable.do_boostraps()
and bootstrap fetching with get.snpR.stats()
, mostly stemming from the changes to window behavior in 1.2.9. These involved either "_" in subfacet names or a few other oddities. "...." now used as a internal seperator, which, since "." is restricted anyway, should be OK. Added tests to catch.get.snpR.stats()
has issues returning an allele frequency matrix alongside other statistics. It returns just an allele frequency matrix perfectly fine.calc_allelic_richness()
to calculate allelic richness via rarefaction according to Hurlburt 1971.calc_private()
to detect private alleles with rarefaction according to Smith and Grassle 1977 by default. This behavior can be controlled with the rarefaction
argument if the raw
private alleles are desired instead.calc_seg_sites()
to calculate the number of segregating sites, optionally via rarefaction by an in-house approach (that will probably be published in a small note if we cannot find it documented elsewhere.)calc_global_fst()
to calculate $F{ST}$ globally across all facet subfacets.calc_pairwise_ld()
. This is a fast alternative to global LD if the user is interested only local LD fluctuations.filter_snps()
to remove loci out of LD within windows.triple_sigma
argument is set) beyond the end of the chromosome. This will ensure that SNPs at the ends of chromosomes are properly within windows, although it will cause truncated windows. This is a trade-off, but doing things this way ensures proper filtering when using filter_snps()
to do LD pruning.pop.fam
or pop.chr
) will now double check for any duplicates and error if detected. This is a bit slower but will be safer.facet.order
argument to plot_pairwise_fst_heatmap()
function to control the order of subfacet plotting.facets
argument with format_snps(format = "vcf")
will now write a file with each possible metadata level.cleanup
argument to calc_association()
to allow the intermediate files from GMMAT
to be retained.plot_clusters()
, plot_manhattan()
, and plot_pairwise_ld_heatmap()
now have the simplify_output
argument which can be used to return only the ggplot
objects, not the data used to produce them. The data otherwise in the $data
is redundant, since ggplot2
objects already contain the data in $data
. The default behavior remains the same since otherwise some old scripts would need to be revised following this change.plot_clusters()
with simplify_output = FALSE
will now also return all of the PCA loadings (for all PCs) in $pca_loadings
provided a PCA was requested.calc_pairwise_fst()
and calc_global_fst()
now both return raw, un-weighted $F_{ST}$ averages as well as the usual `$n_{k}$ weighted averages. While weighted averages should perform better with any missing data, some users may desire the classical solution. filter_snps()
format_snps()
to avoid any scientific notation. Added contig
flags to the headers printed with format = "vcf"
.plot_pairwise_ld_heatmap()
when comparing only two snps on a contig/chr/etc.subset_snpR_data()
due to a typo in stats passing that occasionally caused issues.calc_association()
when using a formula but snpR was expecting a character for splitting reasons.ranger
to fail if using the par
arg to run_random_forest()
.plot_structure()
wouldn't properly say its citation information.calc_smoothed_averages()
to not throw an error if requesting both single and pairwise stats with stats.type
if only one of the two has been calculated.mgc
filter with filter_snps()
with data with only one option for a heterozygous genotype across all loci would produce an error due to matrix dropping.snpRdata
object creation due to not being bi-allelic.get.snpR.stats()
has issues returning an allele frequency matrix alongside other statistics. It returns just an allele frequency matrix perfectly fine.calc_ne()
.filters()
. format_snps()
with vcf
output option will now automatically append these filters to the vcf header. Added two internal options to import.snpR.data
to allow for this and fixed a bit of un-needed redundancy in the process.inds_first
argument to filter_snps()
to allow users to filter on individuals prior to loci. The default remains to filter on loci first for consistancy.remove_garbage
argument to filter_snps()
to allow users to quickly remove very poorly sequenced loci and individuals (jointly, so neither first) prior to applying all of the other filters. This could remove bias caused by very bad loci or samples causing other loci or samples to appear more poorly sequenced than they actually are.read_structure()
to omit non-biallelic loci instead of erroring. This will be changed later to optionally allow them once non-biallelic support is fully added.subset_snpR_data()
to allow for naming the facet and subfacets to subset with using the .facets
, .subfacets
, .snp.facets
, and .snp.subfacets
arguments. These were previously used instead of the current syntax but were more cumbersome and replaced. They have been re-added because they are useful when the facet is stored as an object and then called (due to pipeline scripts, etc.), which is otherwise not easily done. See the documentation for subset_snpR_data()
for examples. This older syntax is not available via the bracket operator ([
).do_bootstraps()
.calc_basic_snp_stats()
.calc_fis()
.summarize_facets()
to return a table of counts of options when provided sample facets.keep_components
argument of calc_fis()
.facet
argument for calc_tree()
.plot_manhattan()
where facet specification was not working correctly for data plotted from a data.frame
rather than a snpRdata
object directly.plot_pairwise_ld()
in parallel with the CLD
option would error after completion.dartR
object import, which should have been natively handled by convert_genlight()
due to the use of %in% class()
instead of methods::is()
. Fixed a few other uses of %in% class()
to stave off future errors.mDat
argument to read_structure()
to support slighly unorthodox structure files with different missing
data identifiers. Note that most structure datasets will have -9
, which is still the default.na.omit()
call to marker name parsing with read_structure()
to support oddly formatted data with tabs or spaces separating locus names.calc_ne()
where errors midway through computation would sometime result in the working
directory being changed to ./NeEstimator
.calc_pairwise_fst()
. The fix isn't perfectly efficient if multiple facets referring to the same sample facet are all passed at once due to bootstrapping needs.format_snps()
. snpR
sorts internally by position first (since the chromosome column doesn't have a fixed name and is supplied as a facet to functions), but VCF files should sort by #CHROM
.run_random_forest()
where non-polymorphic SNPs would cause an error.dapc
plot_clusters()
option sanity checks.read_genepop()
would error with space-delimited genepop files.dartR
dependency in favor of re-distribution of gl2gi()
function for reading in genlight
objects, with author permission.import.snpR.data()
(and process_genlight()
) to rely on dartR
's gl2gi()
function to import genlight
objects, since these can be a bit variable. This does mean adding a suggests for dartR
, unfortunately. Also fixed typos in both process_genlight()
and process_genind()
that were causing the wrappers to fail.calc_fis()
to take the ratio of average variance components instead of the average of ratios following the recommendations of Bahtia et al. 2013 and in line with the new behavior of calc_pairwise_fst()
.keep_components
argument was added to calc_fis()
to return the "b" and "c" variance components for each locus for later processing. Brief instructions were added to the documentation for calc_fis()
to explain this process. This brings calc_fis()
behavior fully in line with calc_pairwise_fst()
for bi-allelic markers, although it still needs to be fixed for poly-allelic markers (which are not yet supported on the front-end).sample.meta()<-
(setting new sample meta) to intelligently update snpRdata()
objects by removing only calculated statistics and summary tabulations that applied to any changed facets instead
of simply re-importing the entire dataset as before. This should substantially speed up this function for large
data sets.smart_PCA
option to plot_clusters()
. This will use Patterson et al. (2006)'s methods (with Price et al. (2006)'s allele frequency estimation) for centering and scaling genotypic data for PCA/tSNE/umap construction. This generally doesn't change much unless there is a lot of missing data, since this approach avoids imputation.sequoia V 2.5.3
which adds 'Year.last' - a cutoff for an individuals reproductive window into format_snps()
. Returned sequoia
to 'suggests'. Incorporated sequoia
function GetMaybeRel
in the run_sequoia()
wrapper. Note that this still needs unit tests, and so should be treated as in development.plot_structure_map()
to take additional ggplot2
layers directly and plot them prior to the pie charts instead of taking sf
objects and trying to guess what the user wanted to do with them. This makes things considerably more flexible and makes it much easier to do things like plot precipitation/etc under the pie charts, although it means the user needs to be a bit more savy. Updated documentation to reflect.nsnps
argument to calc_ne()
to do automatic subsetting to run with less SNPs while still merging results into the original dataset. Note that this isn't terribly quick at the moment since it passes to the still somewhat inefficient subset operator [
. With reasonably small numbers of SNPs, like what is usually suggested for LDNe, it should be fine.get.snpR.stats()
for IBD (ibd
) and Tajima's D (tsd
, d
) to make it easier to fetch the correct values. Also adjusted the requested statistics to send everything to lower case, so things like LD
, D
, or He
will still work.global
argument to calc_tajimas_d()
for calculating global, non-windowed Tajima's D values across the entire genome.mgc
option to filter_snps()
to filter on the minimum number of individuals with a minor allele (regardless of genotype). This is particularly useful if you want to remove alleles present in only one individual instead of straight singletons.mac
argument for filter_snps()
to note that it will remove loci where the
minor allele count is less that or equal to the specified integer. So mac = 1
will remove singletons.merge_snpRdata()
.merge_snpRdata()
would produce weird results.merge_snpRdata()
would produce an error.geno.table
data during snpRdata
facet tabulation when a facet level has no non-missing genotypes would error..base
facet alongside other facets with calc_pairwise_ld()
with the CLD
option
would cause an error during merging..base
facet alongside other facets with different snp level facets during calc_smoothed_averages()
would cause errors.calc_pairwise_ld()
would fail to return a proximity table if there were NA
values in the sample metadata.calc_sfs()
and other SFS functions would error with some but not all data sets due to issues when adding anc
and ref
columns without using snp.meta(x)<-
. Existing tests didn't catch this because it didn't occur with the stickSNPs
test data or other test data sets.plot_diagnostic()
would fail if run with a new facet and the maf
plot option but not the fis
plot option.plot_structure()
was run with facet = ".base"
, which should be treated like facet = NULL
.plot_diagnostic()
no longer plots anything for missingness if there is no missing data!NA
.calc_pairwise_fst()
is run with a facet with only one level.format_snps()
with the output = "rafm"
option without facets.pos
insted of position
from read_plink()
.calc_smoothed_averages()
and calc_tajimas_d()
would do a step 100 times larger than expected if the default was used! Thus the hotfix.plot_diagnostic()
and added manual control of which plots to generate. SFS skipped by default. Improved documentation and testing a bit.calc_pairwise_fst()
to take the ratio of average variance components instead of the average of ratios following the recommendations of Bahtia et al. 2013.keep_components
argument was added to calc_pairwise_fst()
to return the "a", "b", and "c" variance components for each locus for later processing. Brief instructions were added to the documentation for calc_pairwise_fst()
to explain this process.plot_clusters()
via interface to adegenet and some code pulled in from (the thankfully GPL-v3) ade4 package. Licence note adjusted to reflect ade4 code.snpRdata
objects by removing some duplicated information. Old objects should still work fine, but new objects will be considerably smaller. This also improves snpRdata
object
creation times!merge_snpRdata()
to merge snpRdata
objects using syntax equivalent to base R's merge()
function. This can still be made more computationally efficient in the future by avoiding some internal summary tabulation.calc_smoothed_averages()
to be more memory efficient (but slightly slower) when working with large datasets. Added triple_sigma
and gaussian
arguments that determine if $\sigma$ is tripled to have windows with a full size of 6 x sigma and determine if gaussian smoothing is actually used. Added more info to get.snpR.stats()
window returns.calc_tajimas_d()
will now also return the number of raw segregating sites per window (which was already internally calculated, since it is a part of Watterson's Theta, but not returned).2*sigma
(non-overlapping windows)mac
argument to filter_snps()
to filter by minor allele count instead of minor allele frequency. Currently doesn't support faceting, which will probably be added later depending on user need. The singletons
argument is now depreciated.hwe_excess_side
argument ot filter_snps()
, enabling users to only remove SNPs out of HWE that have either het or hom excesses. They default behavior is still to do both.calc_pairwise_fst()
.format_snps()
with the plink
option. For some cases with many scaffolds/etc, this may be necessary. Before this was the default behavior, not an option. To account for this, also added checks to ensure that, if this option is not used, that no chromosome names start in numbers (leading numbers will be replaced with a character equivalent -- 0 -> A, 1 -> B, 9 -> I).plot_clusters()
to use different shapes instead of different fill/color combos as long as
there are less than 25 levels (the number of unique point shapes). This makes for substantially easier to interpret levels in plots! Which
facet gets shapes is controlled by the shape_has_more_levels
argument.verbose
option to check_duplicates()
. It's still a slow function and could use some work or parallelization.lambda_gc_correction
arguments to plot_qq()
and plot_manhattan()
to generate $\lambda_{GC}$ genomic stratification measures and correct for them in plots (see this paper).header_cols
argument to import.snpR.data()
.calc_tajimas_d()
examples to use the updated get.snpR.stats()
syntax.calc_ne
when it is run without the chr
argument or with more than 5,000 SNPs.calc_ne
that shows if no output files are generated.calc_prop_poly
to note that calc_tajimas_d
will also calculate the number of segragating sites (and the number of snps) in a window, which can be used to easily calculate the prop_poly per window.import.snpR.data()
to return an error if the genotypic dimensions are not correct for the provided SNP and sample meta data instead of proceeding and returning a bogus result.plot_structure()
with the method = "structure"
option would cause a bug due to an incorrectly set "LOCISPOP" flag (which is checked even if not using the locprior option).calc_hs()
.vcfR
, which meant that they weren't being properly accounted for as missing data by snpR. Most functions actually still work fine, but a few, like plot_structure()
were unhappy.normalizePath()
to calc_ne to normalize the neestimator path.coan
option for NeEstimator..base
facet with filter_snps()
would cause an error and a spurious warning.facet
arguments to filter_snps()
to use NULL
as a default.gradient_colors
argument to plot_pairwise_ld_heatmap()
to allow for custom sets of colors for the scale.calc_ne()
to catch full file paths given to outfile
.geom
used in plot_pairwise_ld_heatmap()
to geom_bin2d()
to prevent points from disappearing if too many loci are plotted on a chr.read_structure()
(and import.snpR.data()
) to never assume sample names, just loci names (which is the standard) if noted.calc_ne()
.vcfR
, which meant that they weren't being properly accounted for as missing data by snpR. Most functions actually still work fine, but a few, like plot_structure()
were unhappy.calc_prop_poly()
to calculate proportion of polymorphic loci for a given pop.plot_tree()
to calc_tree()
and removed automatic plot generation, since ggtree
, which that depended on, can behave a bit oddly sometimes. An example for generating a plot with ggtree
was added to the examples section of calc_tree()
's documentation. ggtree
is no longer a suggested dependency.summarize_facets()
, which provides information on either the possible facets or provided specific facets for a given
snpRdata
object.filter_snps()
HWE filtering.filter_snps()
.ncp
and ncp.max
options to run_random_forest()
and run_genomic_prediction()
. Previously, selecting the iPCA
option would work, but run with the default iPCA
options and thus determine ncp
internally, which is pretty slow.plot_manhattan()
to allow for easy plotting of gene positions, etc under plots. Both ribbon-style and classic rug style supported.run_sequoia()
and plot_structure_map()
to a secondary github repo, since both of these use CRAN unfriendly dependencies (sequoia and sf, respectively). These functions now source this code (after asking for permission) in order to run, allowing the dependencies to be dropped. This should not change the user experience whatsoever.get.snpR.stats()
to return an informative warning if an empty list is returned.format_snps()
. openxlsx
added to the Suggests
field of the DESCRIPTION
as needed for .xlsx
file creation.calc_ne()
.snpR
expects, and added the missing STRUCTRE import file description to the documentation for import.snpR.data()
.calc_ne()
where pop names were not being properly handled and het/coan method results were not being returned by get.snpR.stats()
..get.task.list()
would sometimes add a space between facet levels when pasting due to weird t()
bug. To the user, this might have occasionally resulted in weird behavior when using multiple SNP facets with mixed numeric and character classes.RefManageR
apparently introduced recently where bibentry
objects wouldn't correctly write. Eliminated RefManageR
dependency, now just uses rbibutils
for reading and writing, and, when calling citations()
, just spits out the full citation rather than the inline for the "Citation: " line. Not ideal, but doesn't need RefManageR
.calc_genetic_distances()
where the "Nei" method wouldn't work (due to a typo in the code).missMDA
installation or reporting iPCA
as an option if an invalid method provided.subfacet
and facet
columns in the stats
slot of a snpRdata
object would get flipped in order during calc_association()
, resulting in the addition of a bunch of empty data rows when something else was merged in. This would produce a downstream error during calc_pairwise_fst
(and potentially elsewhere) due to attempting to cbind
the stats
slot to the facet_meta
slot. Note that this wouldn't cause any bad stats to be calculated or returned anywhere due to the way that get.snpR.stats()
functions to fetch results!plot_manhattan()
wouldn't correctly plot Tajima's D (didn't look for it in the right place)plot_manhattan()
would display the facet name even if there was only one possible level (for example, if the sample facet was the .base level).read_vcf()
would fail due to a typo in a sanity check.calc_sfs()
with fold = FALSE
and calc_abba_baba()
.read_structure()
to read STRUCTURE formatted files. Added auto-read of .str files to import.snpR.data()
.calc_abba_baba()
to do ABBA/BABA tests, including block jackknifing for significance.calc_diagnostic()
to plot basic diagnostic plots for snpRdata objects.pophelper
package, since it's not on CRAN. Switched to GPL license to allow this. The package is still cited automatically when generating a citation during plot_structure()
or plot_structure_map
(). Added a dependency on stats
to allow for the re-packaging.calc_ne()
.stringi
, pkgcond
, stringr
, tidyr
, CATT
. All had only one or a few used functions that were not time intensive, and so could be home-brewed easily to avoid the dependency.stickSNPs
to be smaller--only 100 loci and 100 samples now.verbose
argument to calc_ne()
.snpR_association
vignette, since it required the GMMAT
, BGLR
, and ranger
R packages to be installed, but they are only suggested, not required. This could cause vignette building to fail. May re-tool later to just use the internally implemented association tests.calc_association
if the major allele differed between the case and control.calc_ne()
and run_colony()
actually cleanup now if asked on Windows.verbose
argument on run_colony()
to actually work on Windows.calc_abba_baba()
function.calc_he()
for traditional HE = 2pq calculation. Note that this produces results almost identical to calc_pi()
.calc_hs()
for Coltman et al (1999)'s individual heterozygosity.calc_SFS()
and calc_pairwise_LD_heatmap()
to lowercase for consistency.sfs
arguments in plot_sfs()
and calc_direcitonality()
. They now just take provided sfs objects as x, consistent with other functions.calc_pairwise_fst()
bootstrapping to be more memory efficient.chr_order
argument to plot_manhattan()
to allow for manual resorting of chromosomes (since factors are coerced away in snpRdata
objects).highlight_style
argument to plot_manhattan()
to allow for coloring SNPs instead of labeling them if highlighted.verbose
option (defaulting to TRUE
) for filter_snps()
to suppress all of the filtering reports.NEWS.md
file to track changes to the package.run_random_forest()
where formulas would be incorrectly specified the first time a line of code was run due to a weird environment scope issue.get.snpR.stats()
when requesting fst values from both a facet with and without fst calculated would throw an error during fst matrix construction. Implemented a test.run_random_forest()
.snpRdata
object with filter_snps()
such that no individuals or SNPs remained would result in an uninformative error. Added a test to check error messages here and in susbet_snpR_data()
for this.format_snps()
to allow for BYmax and BYmin columns to get translated to BY.max and BY.min, since periods aren't allowed in snpRdata metadata columns.calc_het_hom_ratio()
complex facets would throw an error.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.