VDJ_3d_properties: Function to calculate 3D-structure propoperties such as the...

View source: R/VDJ_3d_properties.R

VDJ_3d_propertiesR Documentation

Function to calculate 3D-structure propoperties such as the average charge and hydrophobicity, pKa shift, free energy, RMSD of PDB files and add them to an AntibodyForests-object

Description

Function to calculate protein 3D-structure properties of antibodies (or antibody-antigen complexes) and integrate them into an AntibodyForests-object.

Usage

VDJ_3d_properties(
  VDJ,
  pdb.dir,
  file.df,
  properties,
  sequence.region,
  chain,
  propka.dir,
  free_energy_pH,
  sub.sequence.column,
  germline.pdb,
  foldseek.dir
)

Arguments

VDJ

a dataframe with V(D)J information such as the output of Platypus::VDJ_build(). Must contain columns sample_id, clonotype_id, barcode.

pdb.dir

a directory containing PDB files.

file.df

a dataframe of pdb filenames (column file_name) to be used and sequence IDs (column sequence) corresponding to the the barcodes column of the VDJ dataframe.

properties

a vector of properties to be calculated. Default is c("charge", "hydrophobicity").

  • charge: The net electrical charge at pH 7.0

  • hydrophobicity: The hypdrophobicity of each amino acid, devided by the sequence length.

  • RMSD_germline: the root mean square deviation to the germline structure (needs the germline pdb)

  • 3di_germline: the edit distance between the 3di sequence of each sequences and the germline sequence (needs foldseek output).

  • pKa_shift: the acid dissociation constant shift upon binding of the antibody to the antigen (needs Propka output)

  • free_energy: the free energy of binding of the antibody to the antigen at a certain pH (needs Propka output)

  • pLDDT: the pLDDT score of the model

sequence.region

a character vector of the sequence region to be used to calculate properties. Default is "full.sequence".

  • full.sequence: the full sequence(s) in the PDB file

  • sub.sequence: part of the full sequence, for example the CDR3 region in the PDB file. This sub sequence must be a column in the VDJ dataframe.

  • binding.residues: the binding residues in the PDB file

chain

a character vector of the chain to be used to calculate properties. Default is both heavy and light chain Assuming chain "A" is heavy chain, chain "B" is light chain, and possible chain "C" is the antigen.

  • HC+LC: both heavy and light chain

  • HC: heavy chain, assuming chain A is the heavy chain.

  • LC: light chain, assuming chain B is the light chain.

  • AG: antigen, assuming chain C is the antigen.

  • whole.complex: the whole complex of antibody-antigen (all available chains in the pdb file).

propka.dir

a directory containing Propka output files. The propka filenames should be similar to the PDB filenames.

free_energy_pH

the pH to be used to calculate the free energy of binding. Default is 7.

sub.sequence.column

a character vector of the column name in the VDJ dataframe containing the sub sequence to be used to calculate properties. Default is NULL.

germline.pdb

PDB filename of the germline. Default is NULL.

foldseek.dir

a directory containing dataframes with the Foldseek 3di sequence per chain for each sequence. Filenames should be similar to the PDB filenames and it needs to have column "chain" containing the 'A', 'B', and/or 'C' chain. Default is NULL.

Value

the input VDJ dataframe with the calculated 3D-structure properties.

Examples

## Not run: 
vdj_structure_antibody <- VDJ_3d_properties(VDJ = AntibodyForests::small_vdj,
                          pdb.dir = "~/path/PDBS_superimposed/",
                          file.df = files,
                          properties = c("charge", "3di_germline", "hydrophobicity"),
                          chain = "HC+LC",
                          sequence.region = "full.sequence",
                          propka.dir = "~/path/Propka_output/",
                          germline.pdb = "~/path/PDBS_superimposed/germline_5_model_0.pdb",
                          foldseek.dir = "~/path/3di_sequences/")

## End(Not run)

AntibodyForests documentation built on April 4, 2025, 4:45 a.m.