Before this migration (i.e. for versions < 1.99.0), Rsamtools contained a copy of an old (9-10 year old?) version of the samtools and tabix C code.
The latest version of samtools (as of Feb 6, 2019) is 1.9: http://www.htslib.org/
In the recent years the samtools code has been split into samtools + htslib. Most of the code that used to be in samtools is now in htslib. The latest version of htslib is also 1.9.
Bioconductor package Rhtslib contains htslib 1.7, which is a decently recent version of htslib.
After this migration (i.e. for versions >= 1.99.0), Rsamtools no longer contains a copy of the samtools and tabix C code but compiles and links against Rhtslib instead (i.e. against htslib 1.7).
In Rsamtools 1.99.0 all the C/C++ code from the old Rsamtools was migrated to
Rhtslib, except for the code behind asBcf()
and razip()
:
asBcf()
(implemented in src/bcffile.c
) is currently disabled but will
need to be migrated if it turns out that some other package needs it (it
doesn't seem to be the case though). Note that asBcf()
isn't used anywhere
in Rsamtools either (no example in the man page, no unit test, not used in
the vignette).
razip()
(implemented in zip_compression.c
in the old Rsamtools) was
removed. This is because recent samtools and htslib no longer support RAZF.
From the samtools NEWS file:
RAZF and razip are superseded by BGZF/bgzip and have been removed from samtools.
This happened in samtools 1.0 (15 August, 2014).
Even though the C code behind applyPileups()
was migrated (pileupbam.c
),
the examples in its man page and most of the tests in
inst/unitTests/test_applyPileups.R
are currently failing and have been
disabled for now.
Test test_BcfFile_scanBcfHeader_remote
(located in
inst/unitTests/test_BcfFile.R
) passes when run interactively (or via
BiocGenerics:::testPackage('Rsamtools')
) but, for some obscure reason,
NOT in the context of R CMD check
. It has been disabled for now.
As of Feb 6, 2019, 159 Bioconductor packages (154 software, 3 data-experiment, and 2 workflow packages) depend directly on Rsamtools (i.e. via their Depends, Imports, or LinkingTo field). So we've only checked manually a few of them. This was on a 64-bit Ubuntu 16.04.5 LTS laptop running R 3.6 (2018-12-04 r75758) and Bioconductor 3.9 (current devel).
VariantAnnotation: migrated (changes committed),
passes R CMD check
ArrayExpressHTS: migrated (changes not committed yet, Angela Goncalves
contacted but didn't answer yet),
passes R CMD check
BitSeq: migrated (changes committed),
passes R CMD check
DiffBind: migrated (changes committed by Gord Brown),
passes R CMD check
h5vc: migrated (changes committed, including fixing pre-existing
build ERROR currently visible on the build report),
passes R CMD check
podkat: migrated (changes committed, including fixing
inst/examples/example1.vcf.gz
to make it compatible with bcftools 1.7),
passes R CMD check
qrqc: migrated (PR created at https://github.com/vsbuffalo/qrqc/pull/6
per Vince Buffalo's request), passes R CMD check
QuasR: migrated (PR created at https://github.com/fmicompbio/QuasR/pull/9
per Michael Stadler's request), passes R CMD check
seqbias: migrated (patch sent to, and applied by, Daniel Jones),
passes R CMD check
TransView: migrated (changes committed),
passes R CMD check
AllelicImbalance: migrated (contained old BCF file (ERP000101.bcf
, in
extdata/ERP000101_subset/
) that needed to be fixed and reindexed
because it no longer worked with recent bcftools/Rhtslib)
passes R CMD check
biovizBase: passes R CMD check
as-is
compEpiTools: passes R CMD check
as-is
SICtools: test_snpDiff()
(from test_snpDiff.R
) FAILS!
VariantTools: passes R CMD check
as-is
AllelicImbalance: migrated (changes not committed yet)
passes R CMD check
BaalChIP: passes R CMD check
as-is
chimera: passes R CMD check
as-is
CoverageView: passes R CMD check
as-is
exomeCopy: passes R CMD check
as-is
exomePeak: passes R CMD check
as-is
GenomicAlignments: passes R CMD check
as-is
gmapR: does NOT compile (needs to be migrated)
MEDIPS: passes R CMD check
as-is
methylPipe: passes R CMD check
as-is
rnaSeqMap: passes R CMD check
as-is
ssviz: passes R CMD check
as-is
systemPipeR: passes R CMD check
as-is
rnaseqGene: vignette builds as-is
sequencing: vignette does NOT build (because of AnnotationHub razip'ed
FASTA file AH18522
)
As of Feb 6, 2019, 8 CRAN packages depend directly on Rsamtools (i.e. via their Depends, Imports, or LinkingTo field).
BinQuasi: passes R CMD check
as-is
Brundle: passes R CMD check
as-is
ExomeDepth: passes R CMD check
as-is
hoardeR: fails to install because of unrelated pre-existing problem:
Error : object ‘importGFF3’ is not exported by 'namespace:GenomicTools'
(See https://www.r-project.org/nosvn/R.check/r-devel-linux-x86_64-debian-gcc/hoardeR-00install.html)
NIPTeR: passes R CMD check
as-is
PlasmaMutationDetector: passes R CMD check
as-is
RAPIDR: passes R CMD check
as-is
spp: passes R CMD check
as-is
RAZF is no longer supported in recent samtools, and old razip-compressed
FASTA files (.rz
) and their index files (rz.fai
) are now causing problems.
Here is an example from the sequencing workflow:
library(AnnotationHub)
ah <- AnnotationHub()
fa <- ah[["AH18522"]]
library(Rsamtools)
idx <- scanFaIndex(fa) # still works
long <- idx[width(idx) > 82000]
getSeq(fa, param=long) # ERROR! ('open' index failed)
The new way to go is to compress with bgzip and to recreate the index.
/path/to/samtools-1.7/htslib/bgzip myfile.fa # creates myfile.fa,gz
/path/to/samtools-1.7/samtools faidx myfile.fa,gz # generates index files
# myfile.fa.gz.fai
# and myfile.fa.gz.gzi
Note that the compression step is actually optional i.e. one can use
samtools faidx
directly on the uncompressed FASTA file:
gunzip myfile.fa,gz # uncompress to get myfile.fa back
/path/to/samtools-1.7/samtools faidx myfile.fa # generates index file
# myfile.fa.fai only
Note that in this case, only the .fai
index file is generated
(no .gzi
index file).
library(Rsamtools)
bgzip("myfile.fa")
indexFa("myfile.fa.bgz")
Then create a FaFile object that can be used with getSeq()
as usual:
fa <- FaFile("myfile.fa.bgz")
Here is an itemized summary of what still needs to happen (some details are provided above in this document):
software packages
Troubleshoot applyPileups()
unit test failures.
Troubleshoot test_BcfFile_scanBcfHeader_remote()
failure (test is
located in inst/unitTests/test_BcfFile.R
).
Migrate asBcf()
.
Update NEWS
file.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.