library(tidyverse)
library(here)
library(knitr)
library(kableExtra)
library(tidygraph)
library(ggraph)
devtools::load_all()

Introduction

Atherosclerosis

Atherosclerosis is a histopathological state of the arterial wall characterized by lipid accumulation and crystallization, chronic inflammation, cell death, fibrosis, calcification, structural deformation, rupture, hemorrhage, and thrombosis [@pmid7648691; @pmid8181179; @pmid1728483].

Present causal hypotheses of atherosclerosis are complex, including hundreds of environmental, hemodynamical, immunological, lipidomic, biogerontological, and other variables [@pmid30846875; @pmid22064431]. At the same time, the key effect of the plasma concentration of apoprotein B (apoB) -containing and low-density lipoprotein (LDL) particles on atherosclerotic cardiovascular disease has become ever more certain [@pmid28444290] (Figure 1). Atherosclerosis could be infact primarily seen as a crystallopathy [@pmid27332905].

knitr::include_graphics(
  path = here::here("analysis", "images", "ldlc-ascvd-eas-ference-2017.pdf"),
  dpi = 320,
  auto_pdf = TRUE
)

The harmful effects of atherosclerosis are currently unmatched in occurence, including disease states of the major arteries and valves and their effects on symptoms, disability, death, and the society at large (Table 1). For example, almost the same proportion of deaths globally in 2017 have been attributed to ischaemic heart disease (16 %) – most of which wouldn't exist without atheroslcerosis – as to all cancers (17 %). Similarly, 7 % of disability-adjusted life years (DALYs) were related to ischaemic heart disease when, for example, all mental disorders were estimated to have caused 5 %. [@pmid30496104; @pmid30496103]

options(knitr.kable.NA = '')

here("raw-data", "metadata", "review-numbers.csv") %>%
  read_csv(col_types = "cccccnnnc") %>%
  select(-time, -location, -pmid) %>%
  kable(
    format = "latex",
    caption = "Major effects of atherosclerosis with global 2017 estimates based on the Global Burden of Disease 2017 results.",
    booktabs = TRUE,
    format.args = list(big.mark = ","),
    col.names = c("Cause", "Effect", "Measure", "Low", "Point", "High")
  ) %>%
  kable_styling(font_size = 7, position = "center") %>%
  column_spec(1:2, width = "3cm") %>%
  column_spec(3, width = "2cm") %>%
  column_spec(4:6, width = "1cm") %>%
  add_header_above(c(" " = 3, "Estimate" = 3)) %>%
  collapse_rows(1:3, valign = "top")

Diagnostic and therapeutic solutions to the problem of atherosclerotic disease are outlined in practice guidelines that are updated regularly by various expert panels around the world – using various methods – although actual practice may vary much more widely [@10.1093/eurheartj/ehx095; @10.1093/eurheartj/ehw106; @pmid23996286]. In summary, the main interventions currently include physical exercise, nutritional methods, smoking cessation, acetylsalisylic acid, clopidogrel, statins, angiotensin convertase inhibitors, nitrates, beta blockers, calcium channel blockers, angioplasty, stenting, and bypass. The main diagnostic methods used to select and configure the interventions include clinical history and exam, rest and stress electrocardiography, stress ultrasound, stress nuclear imaging, computed tomography, and angiography.

Given the magnitude of the effects of atherosclerosis we still experience today, we are in dire need of better predictive and therapeutic interventions as well as implementation of interventions.

Plasma lipidomics

It has been estimated that almost three fourths of the biological human plasma mass are lipids – consisting of up to hundreds of thousands of distinct, chemically diverse molecules. Out of the almost 600 species that were measureable in 2010, almost all of the total concentration are sterols, glycerophosholipids (160 species), or glycerolipids, with increasingly smaller molar contributions from sphingolipids (204 species), fatty acyls, and prenol lipids. [@pmid22070478] Lipids are also constantly haphazardly modified in oxidative reactions. All individual lipid structures are classified in the LIPIDMAPS database [@pmid19098281]. The lipids species are closely connected in the their metabolic network (Figure 2). At the higher level, plasma lipids are either free, protein-bound, or packed into various lipoprotein particles and lipid structures.

knitr::include_graphics(
  path = here("analysis", "images", "review-lipid-class-network.png"),
  dpi = 650
)

Plasma lipidomics epidemiology should be viewed as hypothesis-generating in its ability to provide evidence for a particular causal model – like other omics studies – even though the dimensionality is not yet as large. In lipidomics studies, the simultaneous measurement of hundreds of lipids from prepared samples is based on mass spectrometry. Mass spectrometry can be run with different partitioning, ionization, acceleration, sensing, and computational methods – ultimately leading to different performance profiles and sources of error. [@pmid27286762]

Major lipidomics results for cardiovascular diseases were reviewed in 2018 [@pmid29665359]. Dozens of lipid species and their subclasses have been associated with features from plaque stability to mortality, and the literature is growing for the eventual make-or-break systematic review, meta-analysis, and validation.

Objectives

In this research project, we aim to contribute valueable evidence on how 1) the plasma lipidome, 2) the serum NMR metabolome, and 3) the whole blood and monocyte transcriptomes relate to atherosclerotic lesions, vascular disease, and mortality, as well as many traditional causes and predictors of atherosclerosis, in the Tampere Vascular Study cohort.

This evidence will contribute to the crucial ongoing effort to produce an accurate systems-wide model of atherosclerosis at the molecular level in order to design optimal predictive and therapeutic technology in the future.

Methods

Design

The Tampere Vascular Study (TVS) cohort consists of two consecutive series of patients scheduled for vascular surgery or exercise stress testing who are followed up for vascular events and measured for an extensive set of samples and variables. Both studies have been described in publications previously [@pmid24122613; @pmid16515696].

Setting

Patients have been recruited in two departments of the Tampere University Hospital in Tampere, Finland, from 10/2001 to 2009 – more specifically in the Division of Vascular Surgery and the Heart Center during 2005-2009 and in the Division of Clinical Physiology during 10/2001-1/2008.

In the exercise stress testing series, venous blood samples have been drawn, prepared, and stored in 2008 by the FimLab Laboratories (Tampere, Finland) personnel. In the vascular surgery series, arterial samples were collected at the time of participation under the supervision of a senior vascular surgeon in the Division of Vascular Surgery and Heart Center (Tampere University Hospital, Tampere, Finland) and prepared and stored by the FimLab Laboratories (Tampere, Finland) personnel.

All patients have filled a questionnaire at the time and place of participation. Health record -based information has been collected by members of the research group during the study. Clinical chemistry data access reaches back to 10/1998 and other EHR-based data have no time constraints. The microscopy of the arterial tissue slides have been done by a senior pathologist of the FimLab Laboratories (Tampere, Finland). NMR metabolomics measurements were done in 2010 by the NMR Metabolomics Laboratory (University of Kuopio) and the Computational Medicine Research Group (University of Oulu). Lipidomics measurements have been done in 2017 by Zora Biosciences Oy (Espoo, Finland).

Participants

Recruiters asked vascular surgery patients to participate if they were receiving a carotid endarterectomy for over 70 % carotid stenosis, a coronary artery bypass for symptomatic coronary artery disease, or a femoral or aortic endarterectomy with aortoiliac or aortobifemoral bypass for symptomatic peripheral arterial disease. Similarly, recruiters asked exercise stress test patients to participate if they were receiving an exercise stress test or a spiroergometer test, and from this population, patients who had subsequently received a coronary angiography were selected to the present cohort. Patients who didn't give informed consent to participate are excluded.

Variables

The complete study database includes data about the patients and their whole blood, monocytes, plasma, serum, and arterial samples. Main sets of variables include DNA (single nucleotide polymorphisms, copy number variations), RNA, micro RNA, bacteria, lipids, lipoproteins, small metabolites, histology, stenosis, hemodynamics, mortality, as well as relevant diagnoses, symptoms, treatments, environmental exposures, and clinical chemistry.

Measurement

Questionnaire

All participants have filled a questionnaire asking about basics including age, sex, weight, and height, exposures including cigarette or pipe smoking history and current alcohol consumption, current diagnoses including diabetes, hypercholesterolemia, hypertension, cardiac insufficiency, cancer, lung disease, liver disease, rheumatic disease, kidney disease, thyroid gland disease, or dementia, as well as current treatments including hormone replacement therapy, diabetes drugs, statins, and blood pressure drugs.

Health records and registries

Mortality data, including the date and cause of death, are collected from the national cause of death register or the electronic health records for each analysis. Similarly, vascular states and events, including coronary artery disease, myocardial infarction, coronary heart disease, arteriosclerosis obliterans, cerebrovascular disease, and peripheral vascular disease as well as symptoms according to the NYHA and CCS classification systems have been assessed from the electronic health records of the Tampere University Hospital.

For all participants, clinical chemistry variables including the maximum plasma total, LDL, and HDL cholesterol and triglyceride concentrations, and minimum CRP, leukocyte, and kreatinine concentrations, as well as hematocrit, hemoglobin, and platelet concentrations have been collected from the participants' electronic health records in the Tampere University Hospital. The variables have been measured as part of the participants' routine healthcare since with a Cobas Integra 700 automatic analyser with reagents and calibrators recommended by the manufacturer (Hoffmann-La Roche Ltd., Basel, Switzerland) and the Friedewald formula (LDL-C).

Physiological data on the extent of arterial stenosis, blood pressure, and ankle-brachial index is also collected via the health records, as well as treatment data on the history of percutaneous transluminal coronary angioplasty (PCTA) and coronary artery bypass grafting (CABG).

Arterial samples

For all patients selected to the vascular surgery series, endarterectomy specimens were collected. The coronary artery bypass patients were a source of lesion-free longitudinal internal thoracic artery (LITA) samples, and others were a source of atherosclerotic samples from the diseased artery.

The arterial tissue samples going for RNA measurements were stabilized with RNALater (Ambion Inc., Austin, TX, USA), and purified with the Trizol reagent (Invitrogen, Carlsbad, CA, USA) and the RNAeasy Kit with DNAase Set (Qiagen, Valencia, CA, USA). The RNA concentrations were checked with the BioPhotometer (Eppendorf, Wesseling-Berzdorf, Germany) and the samples were stored in \SI{-80} {\celsius}.

The RNA was quantified from the stored samples using the following protocol. First, RNA quantity in the samples were validated with the Nanodrop ND-1000 spectrophotometer (Nanodrop Technologies, Wilmington, DE, USA). Then for each RNA sample, 200 ng was reverse-transcribed with Illumina RNA Amplification kit (Ambion, Inc., Austin, TX, USA) (cat. no I1755) and cDNA-cRNA-transcribed in vitro for 14 hours with biotin-11-deoxy uridine triphosphate for labeling (PerkinElmer Life And Analytical Sciences, Inc., Boston, MA, USA). The RNA quantity was again validated with Nanodrop ND-1000 spectrophotometer (Nanodrop Technologies, Wilmington, DE, USA) and the RNA quality was checked with the Experion Automated Electrophoresis System and RNA StdSens Analysis Kit (BioRad Laboratories, Inc., Hercules, CA, USA). Then 1500 ng of cRNA was hybridized to the BeadChip for 18 hours in \SI{55}{\celsius} in the Sentrix Human-8 Expression BeadChip arrays (Illumina, San Diego, CA, USA). The cRNA was labelled with 1 µg/ml Cyanine-3-streptavidine (Amersham Biosciences, Pistacataway, NJ, USA) and finally the labelled cRNA was scanned with the Illumina BeadArray Reader (Illumina, San Diego, CA, USA).

Arterial samples going for the histological measurements were fixed, dissected, sliced, and stained following a standard clinical laboratory protocol and microscopically classified by a pathologist, following the AHA atherosclerotic lesion classification system [@pmid7648691; @pmid8181179; @pmid1728483]. Some histological samples have also undergone targeted immunohistochemical staining in previous studies to assess localization of molecules.

Whole blood and monocyte samples

Whole blood samples going for the gene expression analyses went through RNA isolation with PAXgene tubes (BD, Franklin lakes, NJ, USA) with PAXgene Blood RNA Kit with DNAase Set (Qiagen). The RNA quantity was checked with the BioPhotometer (Eppendorf, Wesseling-Berzdorf, Germany) and the samples were stored in \SI{-80} {\celsius}. Monocyte fractions were acquired by Ficoll-Paque density-gradient centrifugation (Amershan Pharmacia Biotech UK Limited, Buckinghamshire, England), the RNA was isolated from the fractions with the RNeasy Mini Kit (Qiagen), RNA quantities were checked with the BioPhotometer (Eppendorf, Wesseling-Berzdorf, Germany), and the samples were stored in \SI{-80} {\celsius}.

RNA quantification from the stored RNA samples was done using the same method as for the arterial RNA samples described above except that the Illumina HumanHT-12 v3 Expression BeadChip with Illumina iScan system (Illumina, San Diego, CA, USA) was used.

Plasma and serum samples

Lipid extraction for lipidomic measurement was done for all available plasma and serum samples according to the following protocol. First 10 µl of 10 mM 2,6-di-tert-butyl-4-methylphenol methanol, 20 µl of internal standards (Avanti Polar Lipids Inc., Alabaster, AL), and 300 µl of chloroform:methanol (2:1, v:v) (Sigma- Aldrich GmbH, Steinheim, Germany) was added to 10 µl of sample. The sample was mixed and sonicated in water for 10 min, incubated for 40 minutes, and centrifugated for 15 min in 5700 g. The upper phase was transferred and evaporated using nitrogen. The lipid phase was suspended again in 100 µl of water saturated butanol, sonicated in water for 5 minutes, added 100 µl of methanol, centrifuged for 5 min in 3500 g, and finally the supernatant was extracted for mass spectrometry. The extraction method has been described in more detail previously [@pmid27044508].

Lipids were measured from the prepared lipidomic samples with a hybrid triple quadrupole/linear ion trap mass spectrometer (QTRAP 5500, AB Sciex, Concord, Canada) equipped with an ultra-high-performance liquid chromatography (Nexera-X2, Shimadzu, Kyoto, Japan) on Acquity BEH C18, 2.1x50 mm id. 1.7 µm columns (Waters Corporation, Milford, MA, USA) and a scheduled multiple reaction monitoring algorithm. Data preprocessing was done with the Analyst and MultiQuant 3.0 software (AB Sciex). The measurement method has been described in more detail previously [@pmid29262533].

All available serum samples have been also assessed for lipoproteins and small metabolites as follows. The samples were first thawed from storage to \SI{+4} {\celsius} overnight, mixed, centrifuged in 3400 g, and 300 µl of sample was mixed with 300 µl of sodium phosphate buffer solution – all by a Gilson Liquid Handler. The prepared samples are then kept at \SI{+6} {\celsius} until being preheated to \SI{+37.5} {\celsius} for proton nuclear magnetic resonance spectroscopy with the Bruker AVANCE III spectrometer (Bruker BioSpin) with two configurations, the LIPO window for lipoproteins and the LMWM window for small metabolites. Finally, the metabolomic variables are estimated from the measured spectra with regression models, iterative lineshape fitting analysis, and the extended Friedewald method. The method has been described in more detail previously [@pmid19684899].

Study size

The size of the cohort hasn't been originally designed for omics but for hypothesis-driven analysis and the presently planned analyses have low statistical power on their own. This will be taken into account in the analysis and the interpretation of results as the need for meta-analysis and validation to achieve reasonable levels of certainty is clear in all omics research.

Data analysis

We detect error, inform modelling decisions, and describe the study population and internal structure within datasets by first exploring univariate distributions and patterns with visualizations and statistical summarization and then by exploring multivariate patterns with four main methods, including topological overlap matrix (TOM) networks, agglomerative hierarchical clustering, principal component analysis (PCA), and uniform manifold approximation and projection (UMAP). We study the composition of potential hidden features and their relationships to other variables with simple regression models.

We use regression modelling techniques, including survival models, to study both the crude associations and adjusted causal effects between bloodomics and the outcomes. For causal inference, we minimize counfounding with plausible causal assumptions, target trial emulation, and a suitable g-method, such as inverse probability weighting or g-computation. We also study the predictive power of the bloodomics variables by using flexible blackbox learning methods, such as stacked emsembles.

Preprocessing methods, including the handling of outliers, missing values, scale, dispersion, and shape of distributions and feature engineering, are an inherent part of each analytical pipeline. The same applies to the post-processing methods – or how the observation- and model-level results such as features, parameters, and evaluation measures are further analyzed, interpreted, and presented. Pre- and post-processing steps are highly data-dependent and made explicit on a case by case basis.

As a general principle, when there is little reason to select a particular version of a pipeline apriori (e.g. hyperparameters), we run multiple versions as a sensitivity analysis, and interpret the multiplicity of results accordingly.

Software

We fully report all software used in the research. We aim to use best practices of scientific computing [@pmid24415924]. Briefly, we use the statistical computing language R with the RStudio IDE and key external packages including the so-called tidyverse suite of general data science tools, WGCNA and mixOmics for omics data -specialized analyses, rmarkdown for literate programming -driven reporting, and many more [@rref, @tidyverseref, @rmarkdownref, @wgcnaref, @mixomicsref]. We organize the project as an open-source R package which lets us leverage the language's software development infrastructure, including tools for project and dependency management, documentation, and version control.

Reporting

We aim to adhere to open science best practice and make our research output as comprehensive, transparent, findable, accessible, interoperable, reproducible, and reusable as possible.

We use relevant reporting guidelines, including the STROBE guideline for observational epidemiology and the PRISMA guideline for systematic reviews. We aim to publish primary reports also in freely accessible forums, such as medical pre-print archives. Sensitive data may be securely shared only on reasonable request since sharing the raw data publicly compromises research participant privacy and consent under our legislation. We use Git and GitHub version control utilities for all non-sensitive information. All non-sensitive research material can be seen at the project's GitHub page.

Initial results

The study contains a total of 290 patients; 134 (46 %) are from the vascular surgery series and 156 (54 %) from the exercise testing series (Table 2). We have also, for example, started analysing the lipidomics data space and revealing potential patterns (Figure 3).

  options(knitr.kable.NA = '')

  # TODO: compute values directly from data – this is just a test
  "
  Variable,Summary,Missing
  Age,63 years; range 22-91,NA
  Sex,70 % male; 30 % female,NA
  Coronary artery disease,70 %,50 (17 %)
  Coronary heart disease,67 %,23 (8 %)
  Myocardial infarction,35 %,8 (3 %)
  Peripheral artery disease,16 %,4 (1 %)
  Asymptomatic,3 %,NA
  Idiopathic symptoms,1 %,NA
  Claudication,8 %,NA
  Acute limb ischaemia,0.3 %,NA
  Critical limb ischaemia,3 %,NA
  Cerebrovascular disease,22 %,4 (1 %)
  Asymptomatic,1 %,NA
  Idiopathic symptoms,3 %,NA
  Amaurosis fugax,5 %,NA 
  Transient ischaemic attack,6 %,NA
  Infarctus cerebri,8 %,NA
  Cancer,6%, 2 (1 %)
  Cardiac insufficiency,19 %,2 (1 %)
  Dementia,1 %,7 (2 %)
  Diabetes,22%,2 (1 %)
  Type I,1 %,NA
  Type II,17 %,NA
  Hypercholesterolemia,71 %,5 (2 %)
  Hypertension,91 %,3 (1 %)
  Kidney disease,8 %,2 (1 %)
  Rheumatic disease,5 %,2 (1 %)
  Thyroid disease,7 %,2 (1 %)
  Alcohol consumption,66 %,61 (21 %)
  Rarely,50 %,NA
  Moderately,7 %,NA
  Unknown quantity,7 %,NA
  Smoking history,63 %,6 (2 %)
  Quit,36 %,NA
  Current,27 %,NA
  Smoking duration,median 28 years; range 1-60 years,NA
  Smoking amount,NA,56 (19 %)
  0 / day,45 %,NA
  1-4 / day,15 %,NA
  5-14 / day,11 %,NA
  15-25 / day,17 %,NA
  Over 25 / day,3 %,NA
  " %>%
    read_csv(col_types = "ccc") %>%
    knitr::kable(
      format = "latex", 
      caption = "Baseline characteristics of the Tampere Vascular Study cohort.",
      align = "lcc",
      booktabs = TRUE
    ) %>%
    kableExtra::kable_styling(font_size = 7, position = "center") %>%
    kableExtra::column_spec(c(1, 3), width = "3 cm") %>%
    kableExtra::column_spec(2, width = "6 cm") %>%
    kableExtra::add_indent(7:11) %>%
    kableExtra::add_indent(13:17) %>%
    kableExtra::add_indent(22:23) %>%
    kableExtra::add_indent(30:32) %>%
    kableExtra::add_indent(34:36) %>%
    kableExtra::add_indent(38:42)
knitr::include_graphics(
  path = here("analysis", "images", "lipids-tom-network-power-8.png"),
  dpi = 72
)

Schedule

The plan is to publish results in three journal publications and write a PhD thesis during 2020-2023. Most of the data has been collected and data cleaning programs have been developed. Main tasks to be done include additional follow up data collection and documentation, literature review, data analysis and visualization, and reporting.´

\clearpage



eteppo/tvs-project documentation built on Aug. 13, 2019, 8:53 a.m.