misc: Some miscellaneous stuff

miscR Documentation

Some miscellaneous stuff

Description

Some miscellaneous stuff.

Usage

N50(csizes)

Arguments

csizes

A vector containing the contig sizes.

Value

N50: The N50 value as an integer.

The N50 contig size

Definition The N50 contig size of an assembly (aka the N50 value) is the size of the largest contig such that the contigs larger than that have at least 50% the bases of the assembly.

How is it calculated? It is calculated by adding the sizes of the biggest contigs until you reach half the total size of the contigs. The N50 value is then the size of the contig that was added last (i.e. the smallest of the big contigs covering 50% of the genome).

What for? The N50 value is a standard measure of the quality of a de novo assembly.

Author(s)

Nicolas Delhomme <delhomme@embl.de>

See Also

XStringSet-class

Examples

  # Generate 10 random contigs of sizes comprised between 100 and 10000:
  my.contig <- DNAStringSet(
                 sapply(
                   sample(c(100:10000), 10),
                   function(size)
                       paste(sample(DNA_BASES, size, replace=TRUE), collapse="")
                 )
               )

  # Get their sizes:
  my.size <- width(my.contig)

  # Calculate the N50 value of this set of contigs:
  my.contig.N50 <- N50(my.size)

Bioconductor/Biostrings documentation built on Dec. 16, 2024, 8:46 a.m.