pangenomes_from_files: Creates Pangenomes from file list

View source: R/pangenomes_from_files.R

pangenomes_from_filesR Documentation

Creates Pangenomes from file list

Description

This function perform a clustering using a designated maximun distance among samples and creates a set of pangenomes using it. Pan-genomes can be filtered by genome frequency (i.e. the number of genomes that belong to the pangenome). pangenomes_from_list() builds a new accnet object that relates the pangenome with the genes/proteins. The user can filter the proteins that are included by the presence frequency in the set.

Usage

pangenomes_from_files(
  files,
  min_pange_size = 10,
  min_prot_freq = 2,
  file_type = "prot",
  distance,
  cluster,
  coverage = 0.8,
  identity = 0.8,
  evalue = 1e-06,
  n_cores,
  cov_mode = 0,
  cluster_mode = 0
)

Arguments

files

A data.frame of one column whit the path of the files.

min_pange_size

Minimun number of genomes per pan-genome

min_prot_freq

Mininum frequency of a protein (minimun number of pan-genomes in where is present)

file_type

Type of fasta file (prot or nucl)

distance

Maximun distance (mash distance) among genomes in each pangenome. (exlude cluster)

cluster

(optional, exclude distance). Data.frame of two columns (file, cluster) describing the clutering of the files

coverage

Minimun coverage (length) to cluster.

identity

Minimun Identity.

evalue

Maximun Evalue.

n_cores

Number of cores to use.

cov_mode

Coverage mode:

  • 0: Coverage of query and target

  • 1: Coverage of target

  • 2: coverage of query

  • 3: target seq. length needs be at least x% of query length

  • 4: query seq. length needs

cluster_mode

Cluster mode:

  • 0: Setcover

  • 1: connected component

  • 2: Greedy clustering by sequence length

  • 3: Greedy clustering by sequence length (low mem)

Value

An accnet with an extra membership table.


irycisBioinfo/PATO documentation built on Oct. 19, 2023, 3:07 p.m.