Files_LULU_manuscript/README.md

Notes on setup

This file aims to give an overview of the tools and files used for the study Reliable biodiversity metrics from co-occurence based post-clustering curation of amplicon data. All steps/processes for this study can be carried out on the same computer/platform. But, in practise all analyses were carried out on a linux server setup with 64 processors (AMD Opteron(tm) 6380), except R-scripts, which were run on a MacBook Pro (2.6 GHz Intel Core i7, 16 GB 1600 MHz DDR3). All analyses were carried out in one directory (analyses) and sub-directories of this.

Bioinformatic tools

CLI tools were used for this study

R-packages used for this study

To replicate the analyses the following packages (and their dependencies) need to be installed.

Provided scripts

Command line scripts A number of scripts are provided with this manuscript. Place these in the /bin directory and make them executable with "chmod 755 SCRIPTNAME"" or place the scripts directly in the directory/directories where they should be executed (i.e. the analyses directory). Most of them are simple shell scripts. Their context and use is described in the R markdown (A-N) files documenting the workflow.

R-markdown files

This manuscript includes 9 R markdown files (including this one) documenting the analyses.

Data/Files

A number of files and data provided with this manuscript are necessary for the processing (they need to be placed in the analyses directory): Files used by Alfa_demultiplex_universal.sh script. batchfile.list - file with library information for demultiplexing the R1/R2 fastq paired end files. Using the files with info on primer-tag combinations used for the different samples in the different libraries.

Files used by Alfa_demultiplex_for_DADA2.sh script. batchfileDADA2.list - file with library information for demultiplexing the R1/R2 fastq paired end files. Using the files with info on primer-tag combinations used for the different samples in the different libraries. (Different from above, because DADA2 requires unmerged demultiplexing of R1/R2 files).

Table_plants_2014_cleaned.txt - Species/site matrix for the plant survey data.

Raw MiSeq data (not included in supplementary material). (accessible here http://datadryad.org/resource/doi:10.5061/dryad.n9077).



tobiasgf/lulu documentation built on Jan. 17, 2024, 3:57 p.m.