dupes: Duplicate lines in file

View source: R/dupes.R

dupesR Documentation

Duplicate lines in file

Description

Number of duplicates per line of (text) file. Per default saved to file which can be loaded into excel / libreoffice. With conditional formatting of the first column, colors show for each line how often it occurs in the file. A LibreOffice file is included. Note: OpenOffice does not provide color scales based on cell values.

Usage

dupes(
  file,
  ignore.empty = TRUE,
  ignore.space = TRUE,
  tofile = missing(n),
  n = length(d)
)

Arguments

file

File name (character string)

ignore.empty

Should empty lines be ignored? DEFAULT: TRUE

ignore.space

Should leading/trailing whitespace be ignored? DEFAULT: TRUE

tofile

Logical: should output be directed to a file? Otherwise, a dataframe with line numbers and number of duplicates of that line will be printed in the console. DEFAULT: missing(n)

n

Show only the first n values if tofile=FALSE. DEFAULT: length(d)

Value

Either: a data.frame with line numbers of duplicate rows and the number of duplicates
Or: a file is written with the number of duplicates and the original file content.

Note

This has not been tested all that much - feedback is heavily welcome!

Author(s)

Berry Boessenkool, berry-b@gmx.de, Dec 2014

See Also

compareFiles

Examples


file <- system.file("extdata/doublelines.txt", package="berryFunctions")
dupes(file, tofile=FALSE)
dupes(file, tofile=FALSE, ignore.empty=TRUE)

## These are skipped by rcmd check (opening external places is not allowed):
## Not run: dupes(file)

# a template file (dupes.ods) for libreOffice Calc is available here:
system.file("extdata", package="berryFunctions")

## Not run: system2("nautilus", system.file("extdata/dupes.ods", package="berryFunctions"))

# To open folders with system2:
# "nautilus" on linux ubuntu
# "open" or "dolphin" on mac
# "explorer" or "start" on windows


berryFunctions documentation built on May 29, 2024, 4:01 a.m.