biobakery_updateInput: Update input
In shbrief/bioBakeryR: Run bioBakery workflows in Terra

View source: R/biobakery_updateInput.R

biobakery_updateInput

R Documentation

Update input

Description

bioBakeryR workflow takes input fastq file information as a file path (in Google bucket) with a list of all of the read1 files. This file must have the full paths to all of the files and is only expected to include the read1 files (not those for read2). The names for each of the samples will be computed based on the read pair identifier and the input file extension provided. For example a file named SAMPLE1.R1.fastq.gz would have a sample name of "SAMPLE1", a read1 identifier of ".R1". and an extension of ".fastq.gz". It is expected that each sample with have two files (one file for each read of the pair).

Usage

biobakery_updateInput(
  workspaceName,
  ProjectName,
  InputRead1Files,
  InputMetadataFile = NULL,
  AdapterType = "NexteraPE",
  InputExtension = ".fastq.gz",
  InputRead1Identifier = "_R1",
  InputRead2Identifier = "_R2",
  accountEmail = gcloud_account(),
  billingProjectName = gcloud_project()
)

Arguments

`workspaceName`	Name of the workspace
`ProjectName`	The name of the sequencing project. The final output report and zip archive will use this name (only alphanumeric characters allowed).
`InputRead1Files`	A file path (in google bucket) with a list of all of the read1 files. This file must have the full paths to all of the files and is only expected to include the read1 files (not those for read2). The names for each of the samples will be computed based on the read pair identifier and the input file extension provided. For example a file named `SAMPLE1.R1.fastq.gz` would have a sample name of "SAMPLE1", a read1 identifier of ".R1". and an extension of ".fastq.gz". It is expected that each sample with have two files (one file for each read of the pair).
`InputMetadataFile`	(optional) A file path (in google bucket) with a metadata table. This file is used with the visualization task to annotate the figures with metadata. Default is `NULL`.
`AdapterType`	The type of adapter to filter. Available options are "NexteraPE", "TruSeq2", and "TruSeq3".
`InputExtension`	The extension for all of the input files. Default is `.fastq.gz`.
`InputRead1Identifier`	The identifier in the file name for those files that are read1. Default is `.R1`.
`InputRead2Identifier`	The identifier in the file name for those files that are read2. Default is `.R2`.
`accountEmail`	Email linked to Terra account
`billingProjectName`	Name of the billing project

Details

To generate a file to use as input for InputRead1Files, follow the [Terra instructions](https://support.terra.bio/hc/en-us/articles/360033353952-Creating-a-list-file-of-reads-for-input-to-a-workflow), adding to command #2 the InputRead1Identifier and the InputExtension. For example with InputRead1Identifier = ".R1" and InputExtension = ".fastq.gz" command #2 would now be gsutil ls gs:/your_data_Google_bucket_id/ | grep ".fastq.gz" | grep ".R1" > ubams.list. Also since for this workflow we are looking for fastq or fastq.gz input files you might change the name of the file list in this command from ubams.list to fastq_list.txt.

shbrief/bioBakeryR documentation built on April 22, 2022, 3:58 a.m.