nf_sarek_datasets | R Documentation |
Organize variant call files from Nextflow Sarek into 3-4 datasets, grouping files by variant type and workflow with titles having the format: "type Genomic Variants - workflow Pipeline", e.g. "Somatic Genomic Variants - Strelka Pipeline". As you can see, this assumes that you want to create datasets that segregate Somatic and Germline calls. This makes sense for NF because Germline calls can be treated differently. This uses latest version of all files and creates a Draft version of the dataset.
nf_sarek_datasets(
output_map,
parent,
workflow = c("FreeBayes", "Mutect2", "Strelka", "DeepVariant"),
verbose = TRUE,
dry_run = TRUE
)
output_map |
The |
parent |
Synapse id of parent project where the dataset will live. |
workflow |
One of workflows used. |
verbose |
Optional, whether to be verbose – defaults to TRUE. |
dry_run |
If TRUE, don't actually store dataset, just return the data object for inspection or further modification. |
Since we basically just need the syn entity id, variant type, and workflow to group the files.
Instead of getting this info through running map_*
as in the example,
you may prefer using a fileview, in which case you just need to download a table from a fileview
that has id
=> output_id
+ the dataType
and workflow
annotations.
The fileview can be used after the files are annotated. If you want to create datasets before
files are annotated, then you have to use map_*
.
Finally, datasets cannot use the same name if stored in the same project, so if there are multiple batches, the names will have to be made unique by adding the batch number, source data id, processing date, or whatever makes sense.
A list of dataset objects.
## Not run:
syn_out <- "syn26648589"
m <- map_sample_output_sarek(syn_out)
datasets <- nf_sarek_datasets(m, parent = "syn26462036", dry_run = F) # use a test project
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.