selex.loadAnnotation: Load a sample annotation file

Description Usage Arguments Details Value Note See Also Examples

View source: R/SELEX.R

Description

A function used to load sample metadata contained within a sample annotation file and make it visible to the current SELEX session. These samples can then be used to create sample handles (see selex.sample).

Usage

1
selex.loadAnnotation(config_path, data_folder=NULL)

Arguments

config_path

Location on disk to the sample annotation file.

data_folder

Location on disk where FASTQ sample files are stored. This is either an absolute path, or relative to the location of the annotation file. If unspecified, it uses the parent folder of the annotation file.

Details

A sample annotation file is an XML file that acts as a database storing metadata for different SELEX experiments. Here, a SELEX experiment refers to a single SELEX round that has been sequenced. Such a database allows the user to explicitly store all relevant information in a structured manner for easy future access.

A sample annotation file is provided below. Every annotation file can contain multiple SequencingRunInfo instances; every instance within an annotation file must contain a unique name. If multiple annotation files are used in a given SELEX session, all such names must be unique. For example, the following annotation files A and B

File A
<SequencingRunInfo name="exdUbx.Run1"> and
<SequencingRunInfo name="exdUbx.Run2">

File B
<SequencingRunInfo name="exdUbx.Run1">

have a legal naming system if either File A or File B is used in a single SELEX session, but have an invalid naming system if both are used. In general, it is a good idea to ensure that every SequencingRunInfo name is unique. Every SequencingRunInfo instance references a single FASTQ file. The user has the option of providing additional metadata regarding the FASTQ file.

A SequencingRunInfo instance can contain multiple Samples. Every Sample name within a SequencingRunInfo instance must contain a unique name and round combination. For example,

<Sample name="exdLab", round="0"> and
<Sample name="exdLab", round="1">

is a valid name Sample name combinations while

<Sample name="exdLab", round="0"> and
<Sample name="exdLab", round="0">

is not. Non Round 0 Samples have the option of referencing a Round 0 file, working as a checking mechanism to prevent the wrong Round 0 sample from being used to analyze a later round sample.

Once samples have been loaded into the current SELEX session, a sample handle can be generated using the SequencingRunInfo name, Sample name, and Round number. Sample handles make it easier to reference individual Samples while running an analysis. See selex.sample for more information.

Value

Not applicable

Note

Sample annotation files are structured as follows:

<?xml version="1.0" encoding="UTF-8"?>

<SELEXSequencingConfig xmlns:xsi="http://www.columbia.edu/" xsi:noNamespaceSchemaLocation="selex.xsd">

<SequencingRunInfo name="exdUbx.exdScr.0"> <!– information needed for differentiating multiple sequencing info instances –>

<DataFile>/Users/Documents/Data/Run1.fastq.gz</DataFile> <!– absolute or relative path –>
<SequencingPlatform>Illumina</SequencingPlatform> <!– #optional –>
<ResearcherName>John Smith</ResearcherName> <!– #optional –>
<ResearcherEmail>jsmith@columbia.edu</ResearcherEmail> <!– #optional –>
<SequencingFacilityName>Columbia University Genome Center</SequencingFacilityName> <!– #optional –>
<SequencingFacilityEmail>cugc@columbia.edu</SequencingFacilityEmail> <!– #optional –>
<Description>Ubx/Scr Round 0 Probes</Description> <!– #optional –>
<Notes>Our first SELEX Run</Notes> <!– #optional –>

<Sample name="barcodeCCAGCTG.v1" round="0">
<Protein>Probes</Protein>
<Concentration></Concentration> <!– #optional –>
<VariableRegionLength>16</VariableRegionLength>
<LeftFlank>GTTCAGAGTTCTACAGTCCGACGATCTGG</LeftFlank>
<RightFlank>CCAGCTGTCGTATGCCGTCTTCTGCTTG</RightFlank>
<LeftBarcode>TGG</LeftBarcode>
<RightBarcode>CCAGCTG</RightBarcode>
<Round0></Round0>
<Notes></Notes> <!– #optional –>
</Sample>

<Sample name="barcodeCCACGTC.v1" round="0">
<Protein>Probes</Protein>
<Concentration></Concentration>
<VariableRegionLength>16</VariableRegionLength>
<LeftFlank>GTTCAGAGTTCTACAGTCCGACGATCTGG</LeftFlank>
<RightFlank>CCACGTCTCGTATGCCGTCTTCTGCTTG</RightFlank>
<LeftBarcode>TGG</LeftBarcode>
<RightBarcode>CCACGTC</RightBarcode>
<Round0></Round0>
<Notes></Notes>
</Sample>

</SequencingRunInfo>

<!– #New FASTQ file below –>

<SequencingRunInfo name="exdUbx.exdScr.L.2">

<DataFile>/Users/Documents/Data/Run2.fastq.gz</DataFile>
<SequencingPlatform>Illumina</SequencingPlatform>
<ResearcherName>John Smith</ResearcherName>
<ResearcherEmail>jsmith@columbia.edu</ResearcherEmail>
<SequencingFacilityName>Columbia University Genome Center</SequencingFacilityName>
<SequencingFacilityEmail>cugc@columbia.edu</SequencingFacilityEmail>
<Description>Ubx/Scr Round 2</Description>
<Notes>Our first SELEX Run</Notes>

<Sample name="barcodeCCAGCTG.v1.low" round="2">
<Protein>hmExdUbx</Protein>
<Concentration>low</Concentration>
<VariableRegionLength>16</VariableRegionLength>
<LeftFlank>GTTCAGAGTTCTACAGTCCGACGATCTGG</LeftFlank>
<RightFlank>CCAGCTGTCGTATGCCGTCTTCTGCTTG</RightFlank>
<LeftBarcode>TGG</LeftBarcode>
<RightBarcode>CCAGCTG</RightBarcode>
<Round0 sequencingName="exdUbx.exdScr.0" sampleName="barcodeCCAGCTG.v1"/>
<Notes></Notes>
</Sample>

</SequencingRunInfo>

</SELEXSequencingConfig>

See Also

selex.defineSample, selex.getAttributes, selex.sample, selex.sampleSummary, selex.saveAnnotation

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
#Initialize the SELEX package
#options(java.parameters="-Xmx1500M")
#library(SELEX) 

# Configure the current session
workDir = file.path(".", "SELEX_workspace")
selex.config(workingDir=workDir,verbose=FALSE, maxThreadNumber= 4)

# Extract sample data from package, including XML database
sampleFiles = selex.exampledata(workDir)

# Load & display all sample files using XML database
selex.loadAnnotation(sampleFiles[3])
selex.sampleSummary()

# Create a sample handle
r0 = selex.sample(seqName="R0.libraries", sampleName="R0.barcodeGC", round=0)

# Use the sample handle to display sample properties
selex.getAttributes(r0)

SELEX documentation built on Nov. 8, 2020, 5:22 p.m.