prepUMI4C: Prepare UMI4C data

Description Usage Arguments Value See Also Examples

View source: R/contactsUMI4C.R

Description

Prepare the FastQ files for the further analysis by selecting reads with bait and adding the respective UMI identifier for each read in its header.

Usage

1
2
3
4
5
6
7
8
9
prepUMI4C(
  fastq_dir,
  wk_dir,
  file_pattern = NULL,
  bait_seq,
  bait_pad,
  res_enz,
  numb_reads = 1e+11
)

Arguments

fastq_dir

Path of the directory containing the FastQ files (compressed or uncompressed).

wk_dir

Working directory where to save the outputs generated by the UMI-4c analysis.

file_pattern

Character that can be used to filter the files you want to analyze in the fastq_dir.

bait_seq

Character containing the bait primer sequence.

bait_pad

Character containing the pad sequence (sequence between the bait primer and the restriction enzyme sequence).

res_enz

Character containing the restriction enzyme sequence.

numb_reads

Number of lines from the FastQ file to load in each loop. If having memory size problems, change it to a smaller number. Default=10e10.

Value

Creates a compressed FASTQ file in wk_dir/prep named basename(fastq)).fq.gz, containing the filtered reads with the UMI sequence in the header. A log file with the statistics is also generated in wk_dir/logs named umi4c_stats.txt.

See Also

contactsUMI4C.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
if (interactive()) {
path <- downloadUMI4CexampleData(reduced = TRUE)
raw_dir <- file.path(path, "CIITA", "fastq")

prepUMI4C(
    fastq_dir = raw_dir,
    wk_dir = file.path(path, "CIITA"),
    bait_seq = "GGACAAGCTCCCTGCAACTCA",
    bait_pad = "GGACTTGCA",
    res_enz = "GATC"
)
}

UMI4Cats documentation built on Dec. 31, 2020, 2:01 a.m.