virDisco2-package: This is a short tutorial with an example of how to use this...

Description Examples

Description

The execution order of the functions of the analysis pipeline is as follows:

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
########## Create directories and file paths ##########

# To prepare the simulation, the user needs to specify the directory containing the sam-files
# and the directory containing the excel result files from the prior mapping results.
# The simulation result files will be stored in the
# metasimulations directory which the user must specify, too.
# The R-function "create_directories_and_file_paths" creates subdirectories in the metasimulations directory
# and stores the directories and file paths needed later as global variables.

# Create new folders "sam_directory", "excel_directory" and "metasimulations_directory" at the desktop,
# extract the sam-file to the sam-directory and copy the excel-file to the excel-directory.

{
if ( Sys.info()[1] == "Windows") {
desktop.directory = file.path(Sys.getenv("USERPROFILE"))
desktop.directory = gsub("\\\\","/",desktop.directory)
desktop.directory = paste0(desktop.directory,"/Desktop")
my.sam.directory = paste0(desktop.directory,"/sam_directory")
my.excel.directory = paste0(desktop.directory,"/excel_directory")
my.metasimulations.directory = paste0(desktop.directory,"/metasimulations_directory")
}
else if ( Sys.info()[1] == "Linux") {
desktop.directory=system("xdg-user-dir DESKTOP",intern = T)
my.sam.directory = paste0(desktop.directory,"/sam_directory")
my.excel.directory = paste0(desktop.directory,"/excel_directory")
my.metasimulations.directory = paste0(desktop.directory,"/metasimulations_directory")
}
else {
my.sam.directory = file.choose() # Choose sam directory
my.excel.directory = file.choose() # Choose excel directory
my.metasimulations.directory = file.choose() # Choose metasimulations directory
}
}
dir.create(my.sam.directory)
dir.create(my.excel.directory)
unzip(zipfile = "./inst/extdata/NGS-12345_mapq_filtered.zip",exdir = my.sam.directory)
file.copy(from = "./inst/extdata/NGS-12345.xlsx",to = my.excel.directory)
create_directories_and_file_paths ( sam.directory = my.sam.directory , excel.directory = my.excel.directory , metasimulations.directory = my.metasimulations.directory )

########## Extract information from reference genome ##########

# The function "extract_information_from_reference_genome"
# needs the file path to the formatted reference genome fasta-file
# (in which one sequence is stored in one line; in the original fasta-file
# there could be a line break after 80 characters)
# and extracts information about the NC numbers, species, sequences and sequence lengths.
# These information are stored as global variables for the simulation procedure afterwards.

extract_information_from_reference_genome(formatted.reference_genome_fasta.file_path = file.choose()) # Path of formatted reference genome fasta file

########## Calculate consensus sequence of all original reads ##########

# In order to visualise the simulation of read sequences, this function produces a four-coloured graphic
# in which the colours blue, green, red and yellow represent the nucleotides A, C, G and T.
# The read sequences of the original sam-file are plotted in rows
# whereupon each column corresponds to a base position of the reference genome.
# Below the mapped read sequences, the consensus sequence is displayed
# and at the bottom the subsection of the reference genome belonging to the reads.
# The graphic is stored in the temporary directory.

# Before using the function "consensus_sequence",
# the functions "create_directories_and_file_paths" and "extract_information_from_reference_genome" must be executed.

consensus_sequence ( i = 1 , j = 3 )

########## Simulation initialisation ##########

# Before the actual simulation procedure can start,
# statistical distributions of the original mapping results must be calculated by the function "sim_init".
# The statistical distributions from the sam-files will be stored in the RDS objects directory, for each NGS run separately.

# Before using the function "sim_init",
# the functions "create_directories_and_file_paths" and "extract_information_from_reference_genome" must be executed.

# sim_init(file_indices=1,type_of_sam_file="init") # Processing takes a few minutes of time!

########## Graphics of simulation initialisation ##########

# This function produces graphics based on the statistical distributions of the original sam-files
# which are generated by the function "sim_init".
# The function plots distributions of nucleotide, mapping quality, read length, start position and quality value
# of all mapped viruses together and top three mapped viruses
# and writes the plots into one pdf-file.

# Before using the function "sim_init_graphics",
# the functions "create_directories_and_file_paths", "extract_information_from_reference_genome" and "sim_init" must be executed.

sim_init_graphics ( file_indices = 1 )

########## Simulation of fastq-files ##########

# This function produces paired fastq-files based on the statistical distributions of the original sam-files
# which are generated by the function "sim_init".
# The simulated paired fastq-files will be stored in the fastq-directory.

# Do not use read_counts = "density" in combination with start_positions_and_read_lengths = "original"!
# Before using the function "sim_fastq",
# the functions "create_directories_and_file_paths", "extract_information_from_reference_genome" and "sim_init" must be executed.

sim_fastq ( file_indices = 1 , read_counts = "density", start_positions_and_read_lengths = "random", seed = 42 ) # Takes a few seconds to run!

########## Mapping of fastq-files ##########

# The simulated fastq-files are used to be mapped against the artificial viral reference genome with bowtie2.
# After this step, one can filter out reads with a low mapping quality.
# The new sam-file with filtered reads is stored separately in the same directory.

library(rChoiceDialogs)
my.prefix = rchoose.dir() # Choose prefix (folder) of reference genome bowtie2 index files
my.prefix = gsub("\\","/",my.prefix)
my.prefix = paste0(my.prefix,"/")
mapping_bowtie2 ( file_indices = 1 , reference_genome_index_bowtie2.directory = my.prefix , mapq_filter_threshold = 2 , threads = 2 ) # Takes a few seconds to run!

########## Calculation of mapping and error rates ##########

# In total, this function computes six different diagnostic parameters counting relative frequencies,
# two mapping rates and four error rates.
# Furthermore, there are three absolute frequencies which count all mapped reads,
# reads that are mapped to an excel list of viruses and reads that are mapped correctly.
# These values are calculated by using information of the mapping files of the simulated fastq-files
# from the previous step.
# The old excel-file from the original mapping results is completed by six columns for the diagnostic parameters
# and saved as a new excel-file.

error_rates ( file_indices = 1 )

########## Simulation initialisation ##########

# sim_init(file_indices=1,type_of_sam_file="sim") # Execute after mapping simulated fastq-files!

########## Graphics of simulated sam-files ##########

# This function produces graphics based on the statistical distributions of the original sam-files
# and the sam-files produced by mapping of the simulated fastq-files
# which are generated by the function "sim_init" with parameters "init" and "sim".
# The function plots distributions of nucleotide, mapping quality, read length, start position and quality value
# of all mapped viruses together and top three mapped viruses
# for both types of sam-files and writes the plots into one pdf-file.

# Before using the function "sim_init_graphics",
# the functions "create_directories_and_file_paths", "extract_information_from_reference_genome" and "sim_init"
# with parameters "init" and "sim" must be executed.

sim_sam_graphics ( file_indices = 1 )

Moritz-Kohls/virDisco2 documentation built on Feb. 13, 2020, 12:32 a.m.