biobakery_updateInput: Update input

View source: R/biobakery_updateInput.R

biobakery_updateInputR Documentation

Update input

Description

bioBakeryR workflow takes input fastq file information as a file path (in Google bucket) with a list of all of the read1 files. This file must have the full paths to all of the files and is only expected to include the read1 files (not those for read2). The names for each of the samples will be computed based on the read pair identifier and the input file extension provided. For example a file named SAMPLE1.R1.fastq.gz would have a sample name of "SAMPLE1", a read1 identifier of ".R1". and an extension of ".fastq.gz". It is expected that each sample with have two files (one file for each read of the pair).

Usage

biobakery_updateInput(
  workspaceName,
  ProjectName,
  InputRead1Files,
  InputMetadataFile = NULL,
  AdapterType = "NexteraPE",
  InputExtension = ".fastq.gz",
  InputRead1Identifier = "_R1",
  InputRead2Identifier = "_R2",
  accountEmail = gcloud_account(),
  billingProjectName = gcloud_project()
)

Arguments

workspaceName

Name of the workspace

ProjectName

The name of the sequencing project. The final output report and zip archive will use this name (only alphanumeric characters allowed).

InputRead1Files

A file path (in google bucket) with a list of all of the read1 files. This file must have the full paths to all of the files and is only expected to include the read1 files (not those for read2). The names for each of the samples will be computed based on the read pair identifier and the input file extension provided. For example a file named SAMPLE1.R1.fastq.gz would have a sample name of "SAMPLE1", a read1 identifier of ".R1". and an extension of ".fastq.gz". It is expected that each sample with have two files (one file for each read of the pair).

InputMetadataFile

(optional) A file path (in google bucket) with a metadata table. This file is used with the visualization task to annotate the figures with metadata. Default is NULL.

AdapterType

The type of adapter to filter. Available options are "NexteraPE", "TruSeq2", and "TruSeq3".

InputExtension

The extension for all of the input files. Default is .fastq.gz.

InputRead1Identifier

The identifier in the file name for those files that are read1. Default is .R1.

InputRead2Identifier

The identifier in the file name for those files that are read2. Default is .R2.

accountEmail

Email linked to Terra account

billingProjectName

Name of the billing project

Details

To generate a file to use as input for InputRead1Files, follow the [Terra instructions](https://support.terra.bio/hc/en-us/articles/360033353952-Creating-a-list-file-of-reads-for-input-to-a-workflow), adding to command #2 the InputRead1Identifier and the InputExtension. For example with InputRead1Identifier = ".R1" and InputExtension = ".fastq.gz" command #2 would now be gsutil ls gs:/your_data_Google_bucket_id/ | grep ".fastq.gz" | grep ".R1" > ubams.list. Also since for this workflow we are looking for fastq or fastq.gz input files you might change the name of the file list in this command from ubams.list to fastq_list.txt.


shbrief/bioBakeryR documentation built on April 22, 2022, 3:58 a.m.