Batman-Input: BATMAN Input Files are Explained Here In batman: Bayesian AuTomated Metabolite Analyser for NMR spectra

Description

batman gets input parameters and metabolite templates information from the input files explained here. The input files are in either folder ".../runBATMAN/BatmanInput" or folder "extdata" depending on batman arguments. The user can modify the parameter values in the following input files ( do not change the name of these files): batmanOptions.txt, metabolitesList.csv, multi_data.csv, multi_data_user.csv, NMRdata.txt.

Arguments

batmanOptions.txt

Option file to be used by batman. A copy of this file in the output directory is used for batmanrerun. The parameters in batmanOptions.txt file are explained here with example input values. The parameters have to be listed in the particular order given here, and do not leave empty lines in between except beginning with the comment character "%". Please note that for version 1.0.9 and later, one more input line,

"Use specified chemical shift for spectra (chemShiftperSpectra.csv) file (1/0): 0",

is added at the end of this file. For earlier version users updated to this version, running batman will add the above input line at the end of the file if missing.

ppmRange - ppm ranges for analysis: (1.2, 1.6) (2.1, 2.8)

• Put each set of ppm range in a pair of parentheses in the same line, separate start and end ppm values with a comma, separate each set of ppm range with space. Note that, very small number of spectra variables may cause error in wavelet analysis, do not give very narrow ppm ranges and also check the "Down sampling:" factor below, which used together, may also left very small number of spectra variables.

specNo - Ranges of spectra number to be included (e.g. 1,3-4 etc.): 1-3, 5

• Integer, if no. > 1 and fixed effect (same concentration for all spectra) is 0, user will be asked to choose whether to parallelize fittings between spectra when running batman or rerunbatman.

negThresh - Truncation threshold for negative intensities: -0.5

• Spectrum intensity smaller than the lower limit will be replaced by the lower limit.

scaleFac - Intensity scale factor: 20000

• The whole spectrum will be divided by the normalisation factor.

downSamp - Down sampling factor: 3

• Integer, number of spectra variable will be reduced by the factor of the input parameter, 3, in this case. For the example shown, the spectra variables with the index 1:3:end will be used for analysis.

hiresFlag - Save metabolite fit at resolution of original spectrum?

(Yes - 1 / No - 0): 1

• Whether to save the metabolites fitting result in the original resolution without down sampling. Input 1 for yes, and 0 for no.

randSeed - Random number seed: 25

• Random number generation seed, integer.

nItBurnin - Number of burn-in iterations: 4000

• Integer, this is the number of burn-in iterations. The number of iterations after burn in will be asked when running batman. If changing the range of spectrum causing fitting results inconsistent, this indicates that the burn in stage hasn't found the best chemical shift. User may need to increase burn in iterations or reduce prior truncation on ppm shift for each multiple (adjust parameter "rdelta" below or use "csFlag" function).

nItPostBurnin - Number of post-burn-in iterations: 1000

• Integer, this is the number of post-burn-in iterations. The posterior samples will be saved in the frequency specified by the next parameter.

multFile - Choose template of multiplets file from options below: 2

• Integer, choose a template file from the following options:

1, The default template of multiplets in multi_data.csv file,

2, The user input template of multiplets in multi_data_user.csv file,

3, Both the default and user input template of multiplets files.

thinning - Save MCMC state in every ? iterations: 5

• Integer, save posterior samples for every 5 iterations.

cfeFlag - Same concentration for all spectra (fixed effect)?

(Yes - 1 / No - 0): 0

• Whether all the input spectra have the same metabolite concentrations (e.g. technical replicates). Input 1 for yes, and 0 for no.

nItRerun - Number of iterations for batmanrerun: 5000

• Integer, this is the number of iterations for batmanrerun. The rerun will use fixed multiplets positions obtained from running batman. There is no burnin for batman rerun.

startTemp - Start temperature: 1000

• Sets the start temperature parameter of the likelihood of tempering. Higher temperature may need more burnin iterations to cool down.

specFreq - Spectrometer frequency (MHz): 600

• Spectrometer used to collect the spectrum.

a - Gamma-distributed with shape a: 0.00001

b - Gamma-distributed with scale b: 0.000000001

• Hyper parameters for the global precision priors (λ \sim Gamma(a,b/2)) on wavelet coefficients.

muMean - Mean of prior on global peak width (mu) in ln(Hz): 0

muVar - Variance of prior on global peak width (mu) in ln(Hz): 0.1

muVar_prop - Variance of proposal distribution for mu in ln(Hz): 0.002

nuMVar - Variance of prior on peak width offset (nu_m) in ln(Hz): 0.0025

nuMVarProp - Variance of proposal distribution for nu_m in ln(Hz): 0.0001

• For peak width, γ, in Hz of metabolite m, the model for γ is \ln(γ)= μ + ν_m where μ is the spectrum wide average log-peakwidth and ν_m is a random effect on metabolite deviation from μ. The mean of each prior on ν_m is 0. Set the variance of the prior on ν_m to 0 to turn off the random effect on peak width to keep peaks at the same width. The user can keep the proposal variance parameters unchanged for most of the case.

tauMean - mean of the prior on tau: -0.01

• Hyper priors (τ) on negative wavelet coefficient (truncated normal). A more negative value means the wavelet fit will have more negative component.

tauPrec - inverse of variance of prior on tau: 2

• This parameter is inversely proportional to the variance of the prior on τ.

rdelta - Truncation of the prior on peak shift (ppm): 0.030

• Prior of the truncation on ppm shift for all multiplets, individual prior for each multiplet can be changed in the "multi_data.csv" file. Increase this parameter to allow multiplets to shift more. Please note, increasing this value may need more burn in iterations to find the best chemical shift for multiplets.

csFlag - Specify chemical shift for each multiplet in each spectrum?

(chemShiftperSpectra.csv file) (Yes - 1 / No - 0): 0

• Input "1" to use file "chemShiftperSpectra.csv" to specify chemical shift per multiplet and per spectrum. Input "0" will not use that file. User can use the MATLAB tool "SplineFitBATMAN" provided to get more accurate chemical shift per spectra for each multiplet. This tool will save chemical shift information into "chemShiftperSpectra.csv".

metabolitesList.csv

List of metabolite names to be fitted. Put "%" in front of the metabolite name to comment out any metabolite for batman analysis.

multi_data.csv

Multiplet template parameters file, obtained from the online Human Metabolome Database (HMDB) version 2.5. The user can modify the parameters in the template file and specify ppm positions, and normal distribution truncation of ppm shift parameters (a positive value applied as +/- on the distribution).

The columns are:

Metabolite: The name of metabolite the multiplets belongs to.

pos_in_ppm: The ppm position of the center of the multiplets. Refer to the next two parameters for more explanation. If the next parameter "couple_code" was set to "-1" (empirical multiplet), this corresponds to the ppm position of the "0" Hz offset of the "J_constant". If the "couple_code" was set to "-2" (raster multiplet), this corresponds to the ppm position of the center of the raster multiplet. More details of using empirical multiplet and raster multiplet can be found in the following 3 fields.

couple_code: Coupling code. 0 = singlet, 1 = doublet, 2 = triplet, 3 = quartet, 4 = quintet, 5 = sextet, 6 = septet, 1,1 = doublet of doublets, 1,2 = doublet of triplets, 2,1 = triplet of doublets, 2,2 = triplet of triplets, 1,3 = doublet of quartets, 3,1 = quartet of doublets, 1,1,1 = doublet of doublet of doublets, 2,3 = triplet of quartets, 3,2 = quartet of triplets, 3,3 = quartet of quartets. If "-1" is inputted here, a user specified empirical multiplet can be created. An example can be found in file "multi_data_user.csv". If "-2" is inputted here, a raster multiplet with range specified in ppm in the field "J_constant" is used. Examples can be found in file "multi_data_user.csv".

J_constant: J constant.

If the empirical multiplet is used ("couple_code" is "-1"), J_constant contains the offsets in Hz for peaks (each peak corresponds to a offset in Hz, offsets are separated by comma) of a mutiplet positioned at "pos_in_ppm", J_constant/f (f is the magnet frequency in Hz) is the offset of peak in ppm. Note that the spectra are shown in reverse ppm axis, so a positive offset means peak at higher ppm value, and a negative offset is peak at lower ppm value.

If the raster multiplet is used ("couple_code" is "-2"), the field here requires a two values input (in ppm) separated by comma, which specifies the range of the raster multiplet in the pure spectrum. Note in this case, the field "Metabolite" will also be the .txt file name containing the pure spectrum (refer to createPureSpectraTemplate).

relative_intensity: (previously called no_of_protons) In the ideal case, at full relaxation, it should correspond to the number of protons in each multiplet. If the empirical multiplet ("couple_code" is "-1"), the same number of values (corresponding to each offset in "J_constant"") needed here as peak intensities. In this case, the sum of "relative_intensity" is the number of protons in this multiplet. If the raster multiplet is used ("couple_code" is "-2"), a single value is needed here corresponds to the number of protons in the included raster multiplet at full relaxation.

overwrite_pos: The default is "n" for not overwrite position, and in that case the value in "pos_in_ppm" is used for each multiplet. If user want to use a different value from "pos_in_ppm", it should be put in this column.

overwrite_truncation: The default is "n", and the default truncation value is obtained from the user input truncation on ppm shift (rdelta) in batmanOptions.txt. If the user wants to use different truncations for specific multiplets, it should be put in this column. This value will be used to calculate the ppm shift variance value (truncation/5) for the corresponding multiplets.

Include_multiplet: The default is "1" and all multiplets belong to the listed metabolites will be used. Set to "0" to exclude certain multiplet(s) from listed metabolite(s).

multi_data_user.csv

Metabolite template parameters file for user to add new metabolites in the same format as multi_data.csv.

NMRdata.txt

The file has ppm value as its first column, and real part of the NMR spectrum in each of the subsequent columns. This file will be used when none of the input data argument is given.

batman documentation built on Jan. 25, 2018, 3:05 p.m.