genome_summary: Summary of genomic sequences from fasta file

Description Usage Arguments Examples

View source: R/genome_summary.R

Description

Summary of genomic sequences from fasta file

Usage

1
genome_summary(in_f, N, genome)

Arguments

in_f

The input file path with fasta format, or DNAStringSet object from Biostrings package.

N

numeric vector: If calcuration for N90(L90), N50(L50), and N10(L10), N = c(90, 50, 10) The N50 length is defined as the shortest sequence length at 50 percent of the genome. L50 is the number of contigs whose summed length is N50.

genome

numeric: genome size. The default is NULL. The genome size is calculated from the sum of the widths of the input fasta files.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
## Not run: 
fas <- "~/db/genome/CHOK1GS_HDv1/CHOK1GS_HDv1.dna.toplevel.fa.gz"
res <- genome_summary(in_f = fas, N = c(90,50,10))

# summary of genomic sequences
res$summary

# base frequency
res$base_freq

# disribution of contig lengs
hist(log10(res$dist_cntg))


## End(Not run)

shkonishi/rskoseq documentation built on April 18, 2021, 3:50 p.m.