Description Usage Arguments Value Author(s) References Examples
This is a wrapper for the CD-HIT-EST algorithm. According to the CD-HIT user's guide, "CD-HIT-EST clusters a nucleotide dataset into clusters that meet a user-defined similarity threshold, usually a sequence identity." cd-hit-est comes bundled with transdecoder, so it is run from there.
1 2 3 4 5 6 7 8 | cd_hit_est(
input,
output,
wd = here::here(),
other_args = NULL,
echo = pkgconfig::get_config("baitfindR::echo", fallback = FALSE),
...
)
|
input |
Character vector of length one; the path to the input file for cd-hit-est. Should be DNA or AA sequences in fasta format. |
output |
Character vector of length one; the name to assign to the output. Can include a path, in which case the output will be written there. |
wd |
Character vector of length one; the directory where the command will be run. |
other_args |
Character vector; other arguments to pass to cd-hit-est. Each should be an element of the vector. |
echo |
Logical; should the standard output and error be printed to the screen? |
... |
Additional other arguments. Not used by this function, but meant
to be used by |
Within the R environment, a list with components specified in
run
.
Externally, two files will be written: according to the CD-HIT user's guide, "The output are two files: a fasta file of representative sequences and a text file of list of clusters."
The fasta file will be named with the value of output
; the list of clusters
will be the same, with .clstr
appended.
Joel H Nitta, joelnitta@gmail.com
http://www.bioinformatics.org/cd-hit/, http://transdecoder.github.io
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | ## Not run:
library(ape)
library(baitfindR)
# Make temp dir for storing output
temp_dir <- fs::dir_create(fs::path(tempdir(), "baitfindR_example"))
data("PSKY")
# Write downsized transcriptome to temp dir
write.FASTA(PSKY, fs::path(temp_dir, "PSKY"))
# Get CDS
transdecoder_long_orfs(
transcriptome_file = fs::path(temp_dir, "PSKY"),
wd = temp_dir
)
# Cluster similar genes in CDS
cd_hit_est(
input = fs::path(temp_dir, "PSKY.transdecoder_dir", "longest_orfs.cds"),
output = fs::path(temp_dir, "PSKY.cd-hit-est"),
wd = temp_dir,
echo = TRUE
)
# Check output
list.files(temp_dir)
head(readr::read_lines(fs::path(temp_dir, "PSKY.cd-hit-est")))
head(readr::read_lines(fs::path(temp_dir, "PSKY.cd-hit-est.clstr")))
# Cleanup
fs::file_delete(temp_dir)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.