tidyr
, readr
, using future
, carrier
.assignment_ngs
pkgdown
assignment_ngs
fst_WC84
: work fasterassigner
with SeqArray
and GDS object/file fst_WC84
: work with radiator v.1.0assigner
and now lives exclusively in package grur
assignment_mixture
generated by purrr::df
replaced recently by
purrr:dfr
. Changed DESCRIPTION
field accordingly.subsample
argument in assignment_ngs
and assignment_mixture
can now
automatically detect the smallest sample size in the data's grouping. So you can
use subsample = "min"
to let the function decide (if your not sure).assigner
from using stackr
-> radiator
pbmcapply
package.pbmcapply
package.dplyr v.0.7.0
dlr
: simplified arguments, faster function and now creates the Dlr plotsSNPRelate
are removed until the bugs with Fst
calculation are resolved.assignment_ngs
introduced in last commit that was suppose to be
fix. Problem introduced by stackr::change_pop_names
.assigner
as a logofst_NEI87
subsample
and
iteration.subsample
in fst_NEI87
and fst_WC84
SNPRelate
bias issue
is resolved the option is unavailablepbmcapply
for Windowsassignment_ngs
and assignment_mixture
code cleaning to prep for CRAN and
make them easier to debug.assigner
now works in parallel with Windowswrite_gsi_sim
where the file was not
created properly from an internal module.assigner::fst_WC84
can now use SNPRelate
to compute Fst. The confidence intervals are not implemented, yet.
The speed increase left me speechless, dataset with 30K snp are computed in less than 15 sec!assigner::fst_WC84
is 40% faster!assignment_ngs
during imputations, the imputation module could not
recognise that REF/ALT alleles are not necessary or usefull for assignment analysis.
*enhancement to assignment_ngs
and assignment_mixture
so that when
marker.number
include "all"
the iteration.method
is set automatically to 1
when conducting the assignment with all the markers.
Iterations at this point is useless and a waist of time.assignment_mixture
: with assignment.analysis = "gsi_sim
the unknown/mixture samples
are compared with baseline populations using common markers between the pair.
Now, the tables include the number of markers used. The summary provides the mean
number of markers. This number will change each time randomness is used.fst_NEI87
: very fast function that can compute: the overall and pairwise Nei's (1987) fst and f'st (prime).
Bootstrap resampling of markers is avalaible to build Confidence Intervals. The estimates are available as a data frame and a matrix with upper diagonal filled with Fst values and lower diagonal filled with the confidence intervals. Jost's D is also given ;)fst_WC84
: bug fix, the function was not properly configured for multi-allelic markers (e.g. microsatellite, and haplotype format from STACKS). Thanks to Craig McDougall for catching this.assignment_mixture
: added a check to throw an error when pop.levels != the pop.id in strataassignment_mixture
:
* updated with latest modules from stackr
.
* simplified the identification of mixture or unknown samples. See doc.
fst_WC84
tidyr::spread
and tidyr::gather
for data.table::dcast.data.table
and data.table::melt.data.table
to make the code faster, I forgot to split genotype into alleles for gsi_sim
.you need to update [stackr] (https://github.com/thierrygosselin/stackr) to v.0.2.7 to appreciate this new version of assigner.
updated assignment_ngs
with the separate stackr modules to simplify the function.
new data file available for assignment_ngs
: genepop
and genind
object.
assignment_ngs
now accept any vcf input file! i.e. it’s no longer limited to stacks vcf.
new arguments in assignment_ngs
. The assignment using dapc can now use
the optimized alpha score adegenet.dapc.opt == "optim.a.score"
or
the cross-validation adegenet.dapc.opt == "xval"
. This is useful for fine tuning
the trade-off between power of discrimination and over-fitting
(for stability of group membership probabilities).
Cross validation with adegenet.dapc.opt == "xval"
doesn't work with
missing data, so it's only available with imputed data
(i.e. imputation.method == "rf" or "max"
).
With non imputed data or the default: the optimized alpha-score is used
(adegenet.dapc.opt == "optim.a.score"
).
When using adegenet.dapc.opt == "xval"
, 2 new arguments are available:
(1) adegenet.n.rep
and (2) adegenet.training
. See documentation for details.
removed arguments in assignment_ngs
. Removed the pop.id.start
and pop.id.end
arguments that were confusing people. For those used to these arguments,
they are now recycled in the new function individuals2strata
in [stackr] (https://github.com/thierrygosselin/stackr).
The strata file created by this function can be used with the strata
argument in
assignment_ngs
.
2 modified arguments in assignment_ngs
: (1) gsi_sim.filename
is now filename
; and
(2) if you didn't use the imputation argument, replace imputation.method = FALSE
to imputation.method = NULL
or leave the argument missing.
simplified sections of codes in assignment_ngs
that dealt with strata
,
pop.levels
and pop.labels
.
new function: write_gsi_sim
. Write a gsi_sim file from a data frame (wide or long/tidy).
Used internally in [assigner] (https://github.com/thierrygosselin/assigner)
and might be of interest for users.
NEWS.md
file to track changes to the package.fst_WC84
is now a separate and very fast function that can compute: the overall and pairwise Weir and Cockerham 1984 Theta/Fst. Bootstrap resampling of markers is avalaible to build Confidence Intervals (For Louis Bernatchez and his students;). The estimates are available as a data frame and a matrix with upper diagonal filled with Fst values and lower diagonal filled with the confidence intervals.assignment_ngs
+ assignment.analysis = "adegenet"
+ sampling.method = "ranked"
.
A line at the beginning of a gsi_sim code section was deleted makig the assignment with adegenet go through that chunk of code and causing 100% assignment! if (assignment.analysis = "gsi_sim") {code} prevent this problem...import_subsamples_fst
to import the fst ranking
results from all the subsample runs inside an assignment folder.assignment_mixture
with sampling.method = "ranked"
and
assignment.analysis = "adegenet"
.assignment_mixture
for mixture analysis.imputations
is now impute.method
.impute
with 2 options: impute = "genotype"
or impute = "allele"
.data
and covers the three types of files the
function can use: VCF file, PLINK tped/tfam or data frame of genotypes file.tfam
file will be used for the
strata
argument, unless a new one is provided. Columns 1, 3 and 4 of the
tped
are discarded. The remaining columns correspond to the genotype in the
format 01/04
where A = 01, C = 02, G = 03 and T = 04
. For A/T
format, use
PLINK or bash to convert. Use [VCFTOOLS] (http://vcftools.sourceforge.net/) with
--plink-tped
to convert very large VCF file. For .ped
file conversion to
.tped
use [PLINK] (http://pngu.mgh.harvard.edu/~purcell/plink/) with
--recode transpose
.method = "random"
and imputation
GBS_assignment
to assignment_ngs
. Stands for
assignment with next-generation sequencing data.df.file
if you don't have a VCF file. See documentation.strata
if you don't have population id or other metadata info
in the individual name. See documentation.THL
to thl
and snp.LD
to snp.ld
to follow convention.iterations.subsample
changed to iteration.subsample
.iterations
changed to iteration.method
to avoid confusion with other iteration arguments.baseline
and mixture
arguments from the function GBS_assignment
.
These options will be re-introduce later in a separate function.marker.number
higher than the number of markers in the data set was causing
problems. This could arise when using arguments that removed markers from the dataset
(e.g. snp.ld
, common.markers
, and maf
filters).sudo rm /usr/local/bin/gsisim
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.