detect_microsatellites: Detect microsatellites
In thierrygosselin/radiator: RADseq Data Exploration, Manipulation and Visualization using R

detect_microsatellites

R Documentation

Detect microsatellites

Description

Detect Simple Sequence Reapeats (SSR) commonly known as microsatellites... radiator is not re-inventing the wheel here, it uses the software GMATA: Genome-wide Microsatellite Analyzing Toward Application.

Usage

detect_microsatellites(data, gmata.dir = NULL, ...)

Arguments

`data`	(path or object) Object in your global environment or a file in the working directory. The tibble must contain 2 columns named: `MARKERS` and `SEQUENCE`. When RADseq data from DArT is used, `filter_rad` generates automatically this file under the name `whitelist.markers.tsv`.
`gmata.dir`	(path) For the function to work, the path to the directory with GMATA software needs to be given. If not found or `NULL`, the function download GMATA from github in the working directory. Default: `gmata.path = NULL`.
`...`	(optional) Advance mode that allows to pass further arguments for fine-tuning the function. Also used for legacy arguments (see details or special section)

Value

6 files are returned in the folder: detect_microsatellites:

".fa.fms": the fasta file of sequences
".fa.fms.sat1": the summary of sequences analysed (not important)
".fa.ssr": The microsatellites found per markers (see GMATA doc)
".fa.ssr.sat2": Extensive summary (see GMATA doc).
"blacklist.microsatellites.tsv": The list of markers with microsatellites.
"whitelist.microsatellites.tsv": The whitelist of markers with NO microsatellites.

In the global environment, the object is a list with the blacklist and the whitelist.

Note

Thanks to Peter Grewe for the idea of including this type of filter inside radiator.

Author(s)

Thierry Gosselin thierrygosselin@icloud.com

Examples

## Not run: 
# The simplest way to run the function when the raw data was DArT:
mic <- radiator::detect_microsatellites(data = "my_whitelist.tsv")

# With stacks pipeline, the populations module need to be run with --fasta-loci
# You could prepare the file this way (uncomment the function):
#
# prep_stacks_fasta <- function(fasta.file) {
#   fasta <- suppressWarnings(
#     vroom::vroom(
#      file = fasta.file,
#      delim = "\t",
#      col_names = "DATA",
#      col_types = "c",
#      comment = "#"
#    ) %>%
#      dplyr::mutate(MARKERS = stringi::stri_sub(str = DATA, from = 3, to = 7)) %>%
#      tidyr::separate(data = ., col = DATA, into = c("SEQUENCE", "LOCUS"), sep = "_")
#  )
#
#  fasta <- dplyr::bind_cols(
#   dplyr::filter(fasta, MARKERS == "Locus") %>%
#   dplyr::select(LOCUS),
#   dplyr::filter(fasta, MARKERS != "Locus") %>%
#   dplyr::select(SEQUENCE)
#  ) %>%
#  dplyr::mutate(LOCUS = as.numeric(LOCUS))
#   return(fasta)
#  } #prep_stacks_fasta


## End(Not run)

thierrygosselin/radiator documentation built on July 4, 2025, 7:52 a.m.

thierrygosselin/radiator index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

thierrygosselin/radiator
RADseq Data Exploration, Manipulation and Visualization using R

detect_microsatellites: Detect microsatellites
In thierrygosselin/radiator: RADseq Data Exploration, Manipulation and Visualization using R