Accurate estimating genome size is a crucial task in sequencing projects. Current methods often struggle with polyploidy or become inefficient when dealing with species that exceed a ploidy level of six. To address these challenges, we introduce findGSEP, an enhanced version of findGSE. findGSEP utilizes a segmented fitting approach to fit a normal distribution to polyploid species within a segmented framework. This ap-proach simplifies the process of single fitting while significantly expanding the range of ploidy levels it can handle. Moreover, findGSEP offers users interactive tools through both an open-source R application and a web application, facilitating reliable and precise estimation of genome size.
We have released our backend-server findGSEP and provide a CPU-based version of findGSEP online platform. Please check it out!!!
Download and install jellyfish from: Jellyfish Release
Count kmers using jellyfish:
sh
jellyfish count -C -m 21 -t 1 -s 5G *.fastq -o reads.mer
Note: Adjust the memory (-s) and threads (-t) parameters according to your server. This example uses 1 thread and 5GB of RAM. The kmer length (-m) may need to be scaled if you have low coverage or a high error rate. Always use 'canonical kmers' (-C).
Export the kmer count histogram:
sh
jellyfish histo -h 3000000 -t 10 -o reads.histo reads.mer
Note: The thread count (-t) should be scaled according to your server.
Upload reads.histo
to findGSEP.
Download and install KMC from: KMC GitHub
Count kmers using KMC:
sh
kmc -k21 -t1 -m5 -ci1 *.fastq reads_kmc tmp
Note: Adjust the memory (-m) and threads (-t) parameters according to your server. This example uses 1 thread and 5GB of RAM. The kmer length (-k) may need to be scaled if you have low coverage or a high error rate. The -ci1
option ensures that kmers with a count of at least 1 are included.
Export the kmer count histogram:
sh
kmc_tools transform reads_kmc histogram reads_kmc.histo
Note: This will create the histogram file reads_kmc.histo
.
Upload reads_kmc.histo
to findGSEP.
devtools
:install.packages("devtools")
devtools::install_github("sperfu/findGSEP")
You can check our demo dataset at our webserver or drive for complete data. We have provide precalculated histo file whose ploidy number ranging from tetraploid to octoploid.
# Set options (optional):
options(warn = -1)
# Define input parameters:
path <- "histo_files"
samples <- "your_file.histo"
sizek <- 21
exp_hom <- 200
ploidy <- 4
output_dir <- "outfiles"
xlimit <- -1
ylimit <- -1
range_left <- exp_hom * 0.2
range_right <- exp_hom * 0.2
#Call the findGSEP function with specified parameters:
findGSEP(path, samples, sizek, exp_hom, ploidy, range_left, range_right, xlimit, ylimit, output_dir)
# For any questions, usage inquiries, or reporting potential bugs, please contact the author.
After running, You will find 'your_file.histo_hap_genome_size_est.pdf' in your output_dir folder, please give it a try!!!
You can reference to our paramenter setting for those species we used in our webserver or demo dataset.
| Species | Expected Hom(Mb) | Ploidy number | Size k | |-------------------|------------------|---------------|--------| | Chinese sturgeon | 100 | 8 | 21 | | Strawberry | 100 | 8 | 21 | | Wheat | 150 | 6 | 21 | | Redwood | 80 | 6 | 21 | | Cotton | 150 | 4 | 21 | | Javanica | 200 | 4 | 21 | | Potato | 180 | 4 | 21 | | Floridensis | 220 | 4 | 21 | | Crayfish | 35 | 3 | 21 | | Enterolobii | 130 | 3 | 21 | | Incognita | 200 | 3 | 21 | | Seabass | 80 | 2 | 21 | | Bird | 40 | 2 | 21 | | Drosophila | 50 | 2 | 21 | | Pear | 100 | 2 | 21 | | Oyster | 50 | 2 | 21 |
If you enconter problem when installing devtools, especially for those packages below, please consider install them using conda install command:
conda install -c conda-forge r-gert
conda install -c conda-forge r-textshaping
conda install -c conda-forge r-ragg
conda install -c conda-forge r-pkgdown
If you enconter issues like:
could not find function "brewer.pal"
could not find function "alpha"
Solutions:
library(RColorBrewer)
library(ggplot2)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.