make_assembly_spec: Generate an assembly annotation specification for use by...

View source: R/metadata.R

make_assembly_specR Documentation

Generate an assembly annotation specification for use by gather_preprocessing_metadata()

Description

This is the default set of files/information that will be sought. It is a bit much. Each name of the returned list is one column in the final metadata. The values within that name are the relevant parameters for the associated dispatcher.

Usage

make_assembly_spec()

Details

The assembly pipeline I wrote for which this was written does the following: 1. Trimomatic (the assemblies I was doing were miseq phage). 2. Fastqc the trimmed reads. 3. Racer to correct sequencer-based errors. 4. Perform an initial classification with kraken vs. the standard database. (thus if there is contamination we can pick it up) 5. Use kraken to make a hypotehtical host for the phage and filter it. 6. Classify the remaining sequence with kraken vs a viral database. 7. Generate an initial assembly via unicycler. 8. Depth-filter said assembly. 9. Use Blast to search the ICTV for likely taxonomy. 10. Count ORFs to define the +/- strands. 11. Use Phageterm to define the DTRs and/or reorient the genome. 12. Perform a taxonomy search on the assembled genome via phastaf (thus we can see if it is segmented or multiple genomes). 13. Calculate coverage on a per-nucleotide basis. 14. Search for likely terminases, and reorient the genome if phageterm (#11) failed. 15. Create an initial annotation genbank file via prokka. 16. Supplement the prokka ORFs via a trained prodigal run. 17. Supplement them again via a promiscuous run of glimmer. 18. Use phanotate as the arbiter of 'correct' phage ORFs. (e.g. the ORFs from #15-17 will only be used if they agree with and/or do not interfere with these). 19. Merge the results from #15-18 into a single set of ORFs/genbank. 20. Calculate the assembly kmer content via jellyfish. 21. Look for t(m)RNAs via aragorn. 22. Look for tRNAs via tRNAscan. 23. Perform the set of blast/etc searches defined by trinotate. 24. Look for MDR genes via abricate. 25. Perform the set of blast/etc searches defined by interproscan. 26. Cross reference the genome against the extant restriction enzyme catalog. 27. Calculate the codon adaptation index of each ORF against the putative host from #5. 28. Search for phage promoters. 29. Search for Rho termination signals. 30. Attempt to classify the phage's likelihood to be lysogenic/lytic via bacphlip. 31. Search for strong RNA secondary structures via RNAfold. 32. Merge the annotations collected from #21-29 into a larger genbank file. 33. Repeat #32, but this time with feeling. (#32 adds comments with confidence intervals, this strips those out). 34. Make an initial visualization of the assembly via cgview. 35. Collect all the most likely useful stuff from above into a single archive. 36. Clean up the mess.


elsayed-lab/hpgltools documentation built on May 9, 2024, 5:02 a.m.