Home

/

GitHub

/

adnaniazi/krauseNiazi2019Analyses

/

krauseNiazi2019Analyses: Reproducible analysis for tailfindr paper

About the data

When drake::r_make() is run, the following CSV files will be downloaded to the data folder. Given below is a description of these CSV files and their columns.

Datasets

The datasets below contains data from two replicates: first replicate is obtained using SQK_LSK108, while the second one is obtained using SQK-LSK109.

1. dna-krause-lsk108_109-standard_basecalling-tailfindr_estimates-two_replicates-with_filepaths.csv

tailfindr estimations using data that has been basecalled with standard model.

2. dna-krause-lsk108_109-flipflop_basecalling-tailfindr_estimates-two_replicates-with_filepaths.csv

tailfindr estimations using data that has been basecalled with flipflop model.

3. dna-krause-lsk108_109-flipflop_basecalling-decoded_barcodes-two_replicates-with_filepaths.csv

Barcode assignment output using data that has been basecalled with flipflop model.

4. dna-krause-lsk108_109-standard_basecalling-transcript_alignment_start-two_replicates-with_filepaths.csv

Transcript alignment start information using data that has been basecalled with standard model.

5. dna-krause-lsk108_109-flipflop_basecalling-transcript_alignment_start-two_replicates-with_filepaths.csv

Transcript alignment start information using data that has been basecalled with flipflop model.

6. dna-krause-lsk108_109-standard_basecalling-moves_in_tail-two_replicates-with_filepaths.csv

Total moves within the poly(A)/(T) tail boundaries in the data that has been basecalled with standard model.

7. dna-krause-lsk108_109-flipflop_basecalling-moves_in_tail-two_replicates-with_filepaths.csv

Total moves within the poly(A)/(T) tail boundaries in the data that has been basecalled with flipflop model.

For RNA data, we obtained two replicates using SQK-RNA001 sequencing kit, and a third replicate (after receiving the reviews) using SQK-RNA002 kit. Replicates using SQK-RNA001 had the reverse transcription step, whereas the SQK-RNA002 replicate was obtained by omitting this step.

rna-krause-rna001-standard_basecalling-tailfindr_estimates-two_replicates-with_filepaths.csv

tailfindr estimations using data that has been basecalled with standard model. The data contains both of the SQK-RNA001 replicates.

rna-krause-rna001-standard_basecalling-nanopolish_estimates-two_replicates-with_filepaths.csv

Nanopolish estimations using data that has been basecalled with standard model. The data contains both of the SQK-RNA001 replicates.

rna-krause-rna001-standard_basecalling-decoded_barcodes-two_replicates-with_filepaths.csv

Barcode assignment output using data that has been basecalled with standard model. The data contains both of the SQK-RNA001 replicates.

rna-krause-rna001-standard_basecalling-transcript_alignment_start-two_replicates-with_filepaths.csv

Transcript alignment start information using data that has been basecalled with standard model. The data contains both of the SQK-RNA001 replicates.

rna-krause-rna002-standard_basecalling-tailfindr_estimates-with_filepaths.csv

tailfindr estimations using data that has been basecalled with standard model. The data contains a single replicate from SQK-RNA002.

rna-krause-rna002-standard_basecalling-nanopolish_estimates-with_filepaths.csv

Nanopolish estimations using data that has been basecalled with standard model. The data contains a single replicate from SQK-RNA002.

rna-krause-rna002-standard_basecalling-decoded_barcodes-with_filepaths.csv

Barcode assignment output using data that has been basecalled with standard model. The data contains a single replicate from SQK-RNA002.

rna-krause-rna002-standard_basecalling-transcript_alignment_start-with_filepaths.csv

Transcript alignment start information using data that has been basecalled with standard model. The data contains a single replicate from SQK-RNA002.

rna-workman-rna001-standard_basecalling-tailfindr_estimates-all_datasets-with_filepaths.csv

tailfindr estimations for Workman et al. data re-basecalled with standard model.

rna-workman-rna001-standard_basecalling-nanopolish_estimates-all_datasets-with_filepaths.csv

Nanopolish estimations for Workman et al. data re-basecalled with standard model.

r9.4_180mv_450bps_6mer_template_median68pA.model

DNA model. This is used for calculation of thresholds used in our DNA tail-finding algorithm.

r9.4_180mv_70bps_5mer_RNA_template_median69pA.model

RNA model. This is used for calculation of thresholds used in our RNA tail-finding algorithm.

Column descriptions

tailfindr CSV files

read_id: Read ID
read_type: Whether the read is poly(A)/poly(T) or invalid. Only reported for DNA datasets.
tail_is_valid: Whether a poly(A)-tailed read is a full-length read or not. This is important because a poly(A) tail is at the end of the read, and premature termination of reads is prevelant in cDNA. Only reported for DNA datasets.
tail_start: Sample index of start site of the tail in raw data
tail_end: Sample index of end site of the tail in raw data
samples_per_nt: Read rate in terms of samples per nucleotide
tail_length: Tail length in nucleotides. It is the difference between tail_end and tail_start divided by samples_per_nt
file_path: Full read path. Only relevant for internal use within Valen lab.
replicate: Replicate number

Barcode assignment/decoding CSV files

read_id: Read ID
file_path: Full read path. Only relevant for internal use within Valen lab.
read_type: In case of DNA whether a read is GFP-containing Poly(A) or poly(T) read, or an invalid read. Incase of RNA, whether a read is GFP-containing Poly(A) read, or an invalid read.
read_length: Length of the read in terms of bases reported in that read
read_too_long: Is the read greater that 900 nt
read_too_short: Is the read shorter that 900 nt
nas_gfp: Normalized alignment score for GFP alignment
nas_rc_gfp: Normalized alignment score for reverse complement of GFP alignment
nas_fp: Normalized alignment score for front primer alignment
nas_rc_fp: Normalized alignment score for alignment of reverse complement of front primer
nas_10bp: Normalized alignment score for alignment of barcode 10
nas_30bp: Normalized alignment score for alignment of barcode 30
nas_40bp: Normalized alignment score for alignment of barcode 40
nas_60bp: Normalized alignment score for alignment of barcode 60
nas_100bp: Normalized alignment score for alignment of barcode 100
nas_150bp: Normalized alignment score for alignment of barcode 150
nas_slip: Normalized alignment score for alignment of slip barcode. Ignore it. For internal use only.
barcode: Barcode with the highest normalized alignment score
barcode_tie: Is there any other barcode with same highest alignment score.
barcode_2: Which barcode is it that has the same highest alignment score as the first barcode.
barcode_passed_threshold: Does the normalized alignment score of the barcode with the highest alignment score pass the minimum threshold of 0.6.
replicate: Replicate number

Transcript alignment start CSV files

transcript_alignment_start: Sample index of the junction point of the eGFP transcript and the poly(A)/(T) tail
read_id: Read ID
file_path: Full file path. Only relevant for internal use within Valen lab.

Moves in the tail CSV files

read_id: Read ID
moves_in_tail_st: Moves in the tail between tail start and end boundaries for data that has been basecalled with standard model
moves_in_tail_ff: Moves in the tail between tail start and end boundaries for data that has been basecalled with flipflop model

Nanopolish estimation CSV files

readname: Read ID
contig
position
leader_start
adapter_start
polya_start: Sample index of the start of the tail
transcript_start: Sample index of the end of the tail
read_rate
polya_length
qc_tag
file_path: Full read path. Only relevant for internal use within Valen lab.

adnaniazi/krauseNiazi2019Analyses documentation built on June 9, 2019, 7:22 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

adnaniazi/krauseNiazi2019Analyses
Reproducible analysis for tailfindr paper

data/README.md
In adnaniazi/krauseNiazi2019Analyses: Reproducible analysis for tailfindr paper

About the data

Datasets

1. Krause/Niazi et al. DNA data

1. dna-krause-lsk108_109-standard_basecalling-tailfindr_estimates-two_replicates-with_filepaths.csv

2. dna-krause-lsk108_109-flipflop_basecalling-tailfindr_estimates-two_replicates-with_filepaths.csv

3. dna-krause-lsk108_109-flipflop_basecalling-decoded_barcodes-two_replicates-with_filepaths.csv

4. dna-krause-lsk108_109-standard_basecalling-transcript_alignment_start-two_replicates-with_filepaths.csv

5. dna-krause-lsk108_109-flipflop_basecalling-transcript_alignment_start-two_replicates-with_filepaths.csv

6. dna-krause-lsk108_109-standard_basecalling-moves_in_tail-two_replicates-with_filepaths.csv

7. dna-krause-lsk108_109-flipflop_basecalling-moves_in_tail-two_replicates-with_filepaths.csv

2. Krause/Niazi et al. RNA data

rna-krause-rna001-standard_basecalling-tailfindr_estimates-two_replicates-with_filepaths.csv

rna-krause-rna001-standard_basecalling-nanopolish_estimates-two_replicates-with_filepaths.csv

rna-krause-rna001-standard_basecalling-decoded_barcodes-two_replicates-with_filepaths.csv

rna-krause-rna001-standard_basecalling-transcript_alignment_start-two_replicates-with_filepaths.csv

rna-krause-rna002-standard_basecalling-tailfindr_estimates-with_filepaths.csv

rna-krause-rna002-standard_basecalling-nanopolish_estimates-with_filepaths.csv

rna-krause-rna002-standard_basecalling-decoded_barcodes-with_filepaths.csv

rna-krause-rna002-standard_basecalling-transcript_alignment_start-with_filepaths.csv

3. Workman et al. DNA data

rna-workman-rna001-standard_basecalling-tailfindr_estimates-all_datasets-with_filepaths.csv

rna-workman-rna001-standard_basecalling-nanopolish_estimates-all_datasets-with_filepaths.csv

4. ONT mer model files

r9.4_180mv_450bps_6mer_template_median68pA.model

r9.4_180mv_70bps_5mer_RNA_template_median69pA.model

Column descriptions

tailfindr CSV files

Barcode assignment/decoding CSV files

Transcript alignment start CSV files

Moves in the tail CSV files

Nanopolish estimation CSV files

R Package Documentation

Browse R Packages

We want your feedback!

adnaniazi/krauseNiazi2019Analyses Reproducible analysis for tailfindr paper

data/README.md In adnaniazi/krauseNiazi2019Analyses: Reproducible analysis for tailfindr paper

About the data

Datasets

1. Krause/Niazi et al. DNA data

1. dna-krause-lsk108_109-standard_basecalling-tailfindr_estimates-two_replicates-with_filepaths.csv

2. dna-krause-lsk108_109-flipflop_basecalling-tailfindr_estimates-two_replicates-with_filepaths.csv

3. dna-krause-lsk108_109-flipflop_basecalling-decoded_barcodes-two_replicates-with_filepaths.csv

4. dna-krause-lsk108_109-standard_basecalling-transcript_alignment_start-two_replicates-with_filepaths.csv

5. dna-krause-lsk108_109-flipflop_basecalling-transcript_alignment_start-two_replicates-with_filepaths.csv

6. dna-krause-lsk108_109-standard_basecalling-moves_in_tail-two_replicates-with_filepaths.csv

7. dna-krause-lsk108_109-flipflop_basecalling-moves_in_tail-two_replicates-with_filepaths.csv

2. Krause/Niazi et al. RNA data

rna-krause-rna001-standard_basecalling-tailfindr_estimates-two_replicates-with_filepaths.csv

rna-krause-rna001-standard_basecalling-nanopolish_estimates-two_replicates-with_filepaths.csv

rna-krause-rna001-standard_basecalling-decoded_barcodes-two_replicates-with_filepaths.csv

rna-krause-rna001-standard_basecalling-transcript_alignment_start-two_replicates-with_filepaths.csv

rna-krause-rna002-standard_basecalling-tailfindr_estimates-with_filepaths.csv

rna-krause-rna002-standard_basecalling-nanopolish_estimates-with_filepaths.csv

rna-krause-rna002-standard_basecalling-decoded_barcodes-with_filepaths.csv

rna-krause-rna002-standard_basecalling-transcript_alignment_start-with_filepaths.csv

3. Workman et al. DNA data

rna-workman-rna001-standard_basecalling-tailfindr_estimates-all_datasets-with_filepaths.csv

rna-workman-rna001-standard_basecalling-nanopolish_estimates-all_datasets-with_filepaths.csv

4. ONT mer model files

r9.4_180mv_450bps_6mer_template_median68pA.model

r9.4_180mv_70bps_5mer_RNA_template_median69pA.model

Column descriptions

tailfindr CSV files

Barcode assignment/decoding CSV files

Transcript alignment start CSV files

Moves in the tail CSV files

Nanopolish estimation CSV files

R Package Documentation

Browse R Packages

We want your feedback!

adnaniazi/krauseNiazi2019Analyses
Reproducible analysis for tailfindr paper

data/README.md
In adnaniazi/krauseNiazi2019Analyses: Reproducible analysis for tailfindr paper