Description GFF input Genome input Input synteny maps Input Tree Synder parameters Fagin parameters
This documentation is a work in progress ...
Absolute path to a directory containing a GFF file for each species used in the pipeline. This GFF file must contain at minimum mRNA and coding sequence (CDS) features. All start and stop positions must be relative to the reference genomes in FNA_DIR (see argument -n).
The following must be true of all GFF files:
Contain CDS, exon and mRNA entries
All CDS and exon fields contain Parent tags in the 9th column
All features contain an ID or Name field
1 2 3 | Chr1 . mRNA 3631 5899 . + . ID=AT1G01010.1
Chr1 . CDS 3760 3913 . + . ID=AT1G01010.1.CDS-1;Parent=AT1G01010.1
Chr1 . CDS 3996 4276 . + . ID=AT1G01010.1.CDS-2;Parent=AT1G01010.1
|
Expected extension: *.gff
This must be a fasta file (extension 'fna', for Fasta Nucleic Acid). The header must contain sequence ids that match those of the GFF.
Absolute path to a directory containing one synteny map for each species that will be compared. each synteny map should consist of a single file named according to the pattern "<query>.vs.<target>.syn", for example, "arabidopsis_thaliana.vs.arabidopsis_lyrata.tab". these files should contain the following columns:
query contig name (e.g. chromosome or scaffold)
query start position
query stop position
target contig name
target start position
target stop position
score (not necessarily used)
strand relative to query
Example:
1 2 3 4 5 6 | chr2 193631 201899 tchr2 193631 201899 100 +
chr2 225899 235899 tchr2 201999 202999 100 +
chr1 5999 6099 tchr1 6099 6199 100 +
chr1 5999 6099 tchr1 8099 8199 100 +
chr1 17714 18714 tchr2 17714 18714 100 +
chr2 325899 335899 tchr2 301999 302999 100 +
|
a synteny map like this can be created using a whole genome synteny program, such as satsuma (highly recommended). building a single synteny map requires hundreds of cpu hours, so it is best done on a cluster. an example pbs script is provided, see src/satsuma.pbs.
Expected filename format: <query_sciname>.vs.<target_sciname>.syn
Absolute path to a newick format file specifying the topology of the species tree. It must contain all species used in the pipeline AND NO OTHERS (I may relax this restriction later).
NOTE: There must be no spaces in the species names.
Here is an example tree:
(Brassica_rapa,(Capsella_rubella,(Arabidopsis_lyrata,Arabidopsis_thaliana)));
See documentation in synder
default=0.05 - Base p-value cutoffs. These will be ladjusted for multiple testing query protein versus target gene alignments.
default=0.05 - query protein versus all SI translated ORFs.
default=0.05 - query protein versus translated ORFs from spliced transcripts
default=0.05 - query genes versus entire SI (nucleotide match)
default=1000 - number of simulations
default=1000 - number of simulations
default=1000 - number of simulations
default=1e8 - Maximum value of m*n that will be searched
default=0.25 - Ratio of search interval to query interval below which an indel is called
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.