rename_fq: Rename fastq files

Description Usage Arguments Value Examples

View source: R/rename_fq.R

Description

Change of heart concerning the names of your fastq files?

Renaming them shouldn't be a complicated and dangerous venture, even for people with no UNIX bash experience.

This function allows you to rename your fastq files without messing things with your new data. Plus, it will fit nicely in your reproducible workflow!

The function is parallelize and can also move the renamed files in a different folder.

Usage

1
2
3
4
5
rename_fq(
  change.fq,
  parallel.core = parallel::detectCores() - 1,
  verbose = TRUE
)

Arguments

change.fq

(object, path to a file) Data frame in the global environment or a tab separated file. The dataframe as 2 columns: OLD_FQ and NEW_FQ that show the id change you want. If NEW_FQ column contains a different path, this will also move the files. See example below.

parallel.core

Enable parallel execution using multiple core. Default: parallel.core = parallel::detectCores() - 1.

verbose

By function the function chats. Default: verbose = TRUE.

Value

The function returns nothing in the global environment. The function just renames the fq files.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
## Not run: 
# change.fq = "fq.new.naming.tsv". This files contains a data frame
# with 2 columns separated by a TAB: OLD_FQ\tNEW_FQ

# The first line after column headers could be:
# 04_process_radtags/124134249.fastq.gz\t04_process_radtags/EST-NEL-ADU-2013-0001.fastq.gz
# and so on.
# To run:
rename_fq(change.fq = "fq.new.naming.tsv", parallel.core = 12)


# The function can also rename and move files at the same time.
# The first line after column headers could be:
# Downloads/124134249.fastq.gz\t04_process_radtags/EST-NEL-ADU-2013-0001.fastq.gz
# The second column points to a different and existing folder.
# To run, same as above:
rename_fq(change.fq = "fq.new.naming.tsv", parallel.core = 12)
# the copied and renamed fastq files will be in: 04_process_radtags

# Now, how to generate the file fq.new.naming.tsv ? It's actually quite easy within R.

# Below, give the path to the folder containing the fastq files and the fq file extension
fq.path <- "04_process_radtags"
fq.ext <- ".fastq.gz"

# Generate a data frame with the column OLD_FQ and NEW_FQ
# Here, the NEW_FQ column is identical to OLD_FQ (see next step)
fq.new.naming <- tibble::as_tibble(list(
OLD_FQ = list.files(path = fq.path, pattern = fq.ext, full.names = TRUE))) %>%
dplyr::mutate(NEW_FQ = OLD_FQ)


# Save the file
readr::write_tsv(x = fq.new.naming, path = "fq.new.naming.tsv")

# Then, you could edit the NEW_FQ column, by hand, in MS EXCEL,
# or inside R using dplyr package and mutate function.
# All this can be done with very different tools. The stringi package can
become very handy to parse the OLD_FQ column, etc.

fq.new.naming <- fq.new.naming %>%
dplyr::mutate(NEW_FQ = bla bla bla)

fq.new.naming <- fq.new.naming %>%
dplyr::mutate(NEW_FQ = stringi::stri_replace_all_fixed(
str = OLD_FQ,
pattern = "blablabla",
replacement = "newblablabla",
vectorize_all = FALSE))


## End(Not run)

thierrygosselin/stackr documentation built on Nov. 11, 2020, 11 a.m.