utils: Generic internal functions used by genespace
In jtlovell/GENESPACE: Synteny- and orthology-constrained comparative genomics

utils

R Documentation

Generic internal functions used by genespace

Description

utils Convience functions for genespace, not meant to be called directly by the user. Little documentation support provided, use at your own risk.

.onAttach startup messages

check_integer Checks and parses integer arguments to GENESPACE functions. Replaces values of x not in range (min, max) with the minimum or maximum values. If a single value and a value that is not coercible to an integer is specified, returns the default value. If na.rm = TRUE and onlySingleValue = FALSE, drops NAs. from the vector

check_numeric See check_integer. Same but for numeric values.

check_character See check_integer. Same but for character values.

check_logical See check_integer. Same but for logical values.

check_filePathParam QC of user-specified parameter

check_onlyDNA QC to ensure the peptides are actually peptides

read_aaFasta read fasta-formatted peptide sequences

get_nAA count the number of amino acids by gene

read_bed read and check a raw bed file with four columns.

align_charLeft for a vector of character strings, add " " to the right side so they all align to the left when printed

align_charRight for a vector of character strings, add " " to the left side so they all align to the right when printed

read_orthofinderSpeciesIDs Parses the SpeciesIDs.txt file into a data.table and returns to R.

read_orthofinderSequenceIDs Reads the sequence IDs:gene name dictionary into memory.

get_nSeqs Counts the number of lines with ">" in a file. If the output is not convertible to an integer, returns NA.

check_annotFiles ensure the annotations match correctly

check_MCScanXhInstall check that MCScanX_h can be called

parse_ogs read and parse orthofinder orthogroups.tsv files

parse_hogs read and parse orthofinder phylogenetically hierarchical orthogroup (N0.tsv) files

parse_orthologues read and parse orthofinder orthologs

round_toInteger flexible rounding to any integer.

add_rle run-length equivalent conversion, either as the length of the runs or the unique run ids.

gs_colors get a set of colors from the genespace palette

clus_igraph cluster connected subgraphs from pairwise observations

are_colors check if a vector is coercible to R colors

scale_between scale a vector between a range

read_combBed ensures consistent combBed IO

write_combBed ensures consistent combBed IO

read_allBlast ensures consistent allBlast IO

write_allBlast ensures consistent allBlast IO

read_synHits ensures consistent synHit IO

write_synHits ensures consistent synHit IO

add_alpha add transparency to a color

read_intSynPos utility to read interpolated syntenic position files

write_intSynPos utility to write interpolated syntenic position files

get_orthofinderVersion Checks that orthofinder is installed and if so, returns the installed version.

get_diamondVersion Checks that DIAMOND is installed and if so, returns the installed version.

theme_genespace specifies publication-style themes. Col here is the color of the panel.background.

download_exampleData downloads chicken and human annotations from NCBI

interp_approx use approx to interpolate missing positions based on the positions of the bounder x/y coordinates.

read_refGenomeSynHits read in all syntenic hits files involving a single reference genome and, where necessary, invert the hits so that the reference genome is always the query (genome1).

read_refGenomeAllBlast read in all hits files involving a single reference genome and, where necessary, invert the hits so that the reference genome is always the query (genome1).

get_bedInBlk splits the bed file so that two entries (query and target ) match the physical bounds of blocks in the hits object.

add_array2bed add array ID to the bed file

add_arrayReps2bed add array representative genes to the combined bed object

write_pangenes utility to correctly write in long-formatted pan=gene text files.

read_pangenes utility to correctly read in long-formatted pangene text files.

find_nnHit find anchor hits that are nearest to non-anchor xy position

flag_hitsInRadius given a vector of anchors, pulls xy positions within radius of anchors using dbscan

flag_hitsInBlk finds hits within the bounding coordinates of blocks

get_orderedTips Respect ordering of tree when ladderized

pull_pairwise Builds new pairwise files in /syntenicHits

Usage

.onAttach(...)

check_integer(
  x,
  min = -Inf,
  max = Inf,
  default = NA,
  na.rm = FALSE,
  onlySingleValue = length(x) <= 1
)

check_numeric(
  x,
  min = -Inf,
  max = Inf,
  default = NA,
  na.rm = FALSE,
  onlySingleValue = length(x) <= 1
)

check_character(
  x,
  default = NULL,
  na.rm = FALSE,
  onlySingleValue = length(x) <= 1
)

check_logical(x, default = NA, na.rm = FALSE, onlySingleValue = length(x) <= 1)

check_filePathParam(filepath)

check_onlyDNA(path)

read_aaFasta(path)

get_nAA(path)

read_bed(filepath)

align_charLeft(x)

align_charRight(x)

read_orthofinderSpeciesIDs(filepath)

read_orthofinderSequenceIDs(filepath)

get_nSeqs(filepath)

check_annotFiles(filepath, genomeIDs)

check_MCScanXhInstall(filepath)

parse_ogs(filepath, genomeIDs)

parse_hogs(filepath)

parse_orthologues(filepath)

round_toInteger(x, to)

add_rle(x, which = "n")

gs_colors(n = 10)

clus_igraph(id1, id2)

are_colors(col)

scale_between(x, min, max, scale1toMean = TRUE)

read_combBed(filepath)

write_combBed(x, filepath)

read_allBlast(filepath, ...)

write_allBlast(x, filepath)

read_synHits(filepath, ...)

write_synHits(x, filepath)

add_alpha(col, alpha = 1)

read_intSynPos(filepath)

write_intSynPos(x, filepath)

get_orthofinderVersion(filepath)

get_diamondVersion(filepath)

theme_genespace(col = "black")

download_exampleData(filepath)

interp_approx(x, y)

read_refGenomeSynHits(gsParam, refGenome)

read_refGenomeAllBlast(gsParam, refGenome)

get_bedInBlk(hits, bed)

add_array2bed(bed, synBuff, maxIter = 10, reorder = TRUE)

add_arrayReps2bed(bed)

write_pangenes(x, filepath)

read_pangenes(x, filepath, ...)

find_nnHit(x, y, isAnchor, radius)

flag_hitsInRadius(x, y, isAnchor, radius)

flag_hitsInBlk(x, y, blkID)

get_orderedTips(treFile, ladderize = TRUE, genomeIDs)

pull_pairwise(gsParam, verbose)

Arguments

`...`	additional parameters passed on to other functions If called, `utils` returns its own arguments.
`x`	single-value parameter, string, integer, numeric, list, vector
`min`	if x is an integer or numeric, the minimum value allowed
`max`	if x is an integer or numeric, the minimum value allowed
`default`	if there is a problem with x, replace with this value
`na.rm`	logical, should NA's be dropped
`onlySingleValue`	logical, should long a single valuebe returned?
`filepath`	file.path
`path`	file path character string
`genomeIDs`	character vector of genomeIDs
`to`	integer, top end value
`which`	character specifying what to use
`n`	integer, number of observations
`id1`	character, first id
`id2`	character, second id
`col`	color
`scale1toMean`	logical, should values be scaled to 1?
`alpha`	numeric, transparency
`y`	numeric, y values
`gsParam`	genespace parameters, see init_genespace.
`refGenome`	character string specifying the reference genome
`hits`	data.table of syntenic hits
`bed`	data.table containing the combined bed object
`synBuff`	see init_genespace.
`maxIter`	integer, the maximum number of iterations to use
`reorder`	logical, should the gene rank position be re-ordered?
`isAnchor`	logical, is a hit an anchor?
`radius`	numeric, the 2d search radius.
`blkID`	vector of block IDs
`treFile`	file.path to the tree file.
`ladderize`	logical, should the tree be ladderized?
`verbose`	logical, should updates be printed to the console?