In sebastiz/PhysiMineR: Functions to extract, clean, manipulate and analyse upper gastrointestinal physiological parameters

Title and abstract

Study Design

Remember to use PICO

1.Patient population

In patients being investigated for GORD with a negative pH impedance study

2.Intervention

who go on to have a BRAVO study

3.Comparison

Is there a significant increase between the two tests

4.Outcome

in the patients who have GORD

5. Further

What can predict a BRAVO positive, pH impedance negative investigation to support a Straight-to-BRAVO test?

Diagnosis a) In patients being investigated for GORD with a negative impedance study for GORD who go on to have a BRAVO study, what proportion have a positive BRAVO for GORD condition

$\color{red}{\text{A.Indicate the study’s design with a commonly used term in the title or the abstract}}$.

Example 1

Title: ‘An observational study with long-term follow-up of canine cognitive dysfunction: Clinical characteristics, survival and risk factors’ (Fast et al., 2013).

Example 2 Title: ‘Case-control study of risk factors associated with Brucella melitensis on goat farms in Peninsular Malaysia’ (Bamaiyi et al., 2014).

Explanation Including the study design term in the title or abstract when a standard study design is used, or at least identifying that a study is observational, allows the reader to easily identify the design and helps to ensure that articles are correctly indexed in electronic databases (Benson and Hartz, 2000). In STROBE, item 1a only requests that a common study design term be used. However, in veterinary research, not all observational studies are easily categorized into cohort, case–control, or cross-sectional study designs. Therefore, we recommend including that the study was observational and, if possible, the study design or important design characteristics, for example longitudinal, in the title.

title: "Template for STROBE presentation at conference abstracts" author: "Sebastian Zeki" date: "04/09/2018" output: html_document

knitr::opts_chunk$set(echo = TRUE)

Introduction

$\color{red}{\text{B.Introductory sentence 1: Define the disease condition}}$.

eg Aperistaltic oesophagus is a term that covers all conditions where achalasia is absent and there is no peristaltic activity

$\color{red}{\text{C.Introductory sentence 2: The overview of the disease condition}}$.

eg Aperistaltic oesophagus remains a difficult condition to treat with no specific and effective treatments

$\color{red}{\text{D.Introductory sentence 3: The problem being addressed}}$.

eg The number of non-achalasia aperistaltic oesophagus patients is unknown.

eg Although often isolated from outpatients in veterinary clinics, there is concern that MRSP follows a veterinary-hospital associated epidemiology

Aim

eg The aim of the study was to assess the number of patients with aperistaltic oesophagus without achalasia and ascertain their demographics

Methods

$\color{red}{\text{E.Define the Dates}}$.

Get the dates from the minimum of the VisitDate column r 1+1

$\color{red}{\text{F.Define the Location}}$. Standard sentence here about GSTT The results of high resolution studies performed in patients with cough as part of the presenting complaint to the gastroenterology service at Guy's and St Thomas' NHS Trust were examined retrospectively between June 2008 and January 2018.

$\color{red}{\text{F.Define the Eligibility}}$. All patients selected were adults over the age of 18. Ethics was approved (IRAS number) and by the local ethics board.

$\color{red}{\text{F.Define the Exclusion}}$. Patients with incomplete data sets were exluded from the study. Where x was not available fromt the report, the raw file was located and manually analysed

$\color{red}{\text{G.Define the Sampling}}$.

Patients were selected according to Chicago Classification v3. Patients symptoms were selected from self reporting at impedance if this was done at the same time as high resolution manometry. If this was not the case then the symptoms were extracted either from the final report summarising the patient's condition and physiological diagnosis, or from the clinicians details when ordering the investigation.

$\color{red}{\text{H.Define the data cleaning methodology methods}}$.

This involves how the relevant columns were accordionised (eg categorising the patients/and how subgroups were formed).

library(PhysiMineR)
library(DiagrammeR)
library(CodeDepends)
library(tidyverse)
library(readxl)
library(EndoMineR)
library(dplyr)


######################################### Data acquisiton######################################### 
source("/home/rstudio/PhysiMineR/R/sourcing_Data/PhysiData_Acq.R")

######################################### Data cleaning & Data merging######################################### 
source("/home/rstudio/PhysiMineR/R/sourcing_Data/PhysiData_Clean.R")

######################################### Data accordionisation######################################### 
source("/home/rstudio/PhysiMineR/R/sourcing_Data/PhysiData_Accordion.R")

######################################### Data forking (filtering and subsetting) #########################

#Get the patients who have had both the BRAVO and impedance study- need to cross match 
#Get all the negative Impedance results
#dataImpSypmAndImpedanceMain<-dataImpSypmAndImpedanceMain[!is.na(dataImpSypmAndImpedanceMain$AcidReflux),]

HRMImportMain$LOS_relax<-as.factor(HRMImportMain$LOS_relax)
HRMImportMain$LowerOesoph<-as.factor(HRMImportMain$LowerOesoph)


#Here I merge the impedance dataset and then the HRM dataset. The tests need to be merged based on the nearest date to prevent duplication etc.
ImpAndHRM <- dataImpSypmAndImpedanceMain %>%
  inner_join(HRMImportMain, by ="HospNum_Id") %>%
  mutate(Date_ABS_Diff = abs(VisitDate.x - VisitDate.y)) %>%
  arrange(HospNum_Id, Date_ABS_Diff) %>%
  group_by(HospNum_Id) %>%
  slice(1)

#Now you need to get all the BRAVOs that are closest to the impedance
BravoHRMAndImpedance <- ImpAndHRM %>%
  inner_join(AllBravo, by ="HospNum_Id") %>%
  mutate(Date_ABS_Diff = abs(VisitDate.x - VisitDate.y.x)) %>%
  arrange(HospNum_Id, Date_ABS_Diff) %>%
  group_by(HospNum_Id) %>%
  slice(1)

#Hash the id's for the patients
library(digest)
BravoHRMAndImpedance$HospNum_Id<-unlist(lapply(BravoHRMAndImpedance$HospNum_Id, function(x) digest(x,algo="sha1")))


#Merge all hrm with AllBravo to get the full dataset
#This merges the tests so that the nearest hrm to the BRAVO is chosen to avoid replication
#Merge all hrm with AllBravo to get the full dataset
#This merges the tests so that the nearest hrm to the BRAVO is chosen to avoid replication
#Note that a lot of Bravo's do not have HRM as may not have been requested by outside referrers


#Then try to ascertain positive and negative BRAVOs
BravoHRMAndImpedanceNEGBRAVO<-BravoHRMAndImpedance[!grepl("NoAcid",BravoHRMAndImpedance$AcidRefluxBRAVO),]
BravoHRMAndImpedancePOSBRAVO<-BravoHRMAndImpedance[grepl("NoAcid",BravoHRMAndImpedance$AcidRefluxBRAVO),]

#Really exploratory data analysis
#Exploratory data analysis- use Data Explorer package or GGally package
######Univariate analysis....
#Show all the missing data in each column
#Overall things to report in multivariate analysis from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5457877/

#(1) Selection of independent variables; 
#(2) Fitting procedure; 
#(3) Coding of variables; 
#(4) Interactions; 
#(5) Collinearity; 
#(6) Statistical significance (OR, 95% confidence interval [CI], P value); 
#(7) Goodness-of-fit; 
#(8) Checking for outliers; 
#(9) Complete identification of the statistical software application that was used; 
#(10) Sufficient events (>10) per variable; 
#(11) Participation of statisticians and epidemiologists; and 
#(12) Conformity with linear gradient.


library(DataExplorer)

#Steps:
#1. Get rid of bad columns with missing data

#Tells you which columns have the most missing data so they can be excluded
missing<-profile_missing(BravoHRMAndImpedance)


missing<-data.frame(missing)
#Only <10% for each row missing data is accepted 
BravoHRMAndImpedanceNoMissing<-missing[!grepl("Bad|Remove",missing$group),]

#Pick the columns with good data in them from this set
BravoHRMAndImpedanceNoMissing<-BravoHRMAndImpedance[,match(as.character(BravoHRMAndImpedanceNoMissing$feature),names(BravoHRMAndImpedance))]


library(psych)
#Only select the numerical variables

#2.Recoding
BravoHRMAndImpedance$AcidRefluxBRAVO<-gsub("NoAcid",0,BravoHRMAndImpedance$AcidRefluxBRAVO)
BravoHRMAndImpedance$AcidRefluxBRAVO<-gsub(".*Acid",1,BravoHRMAndImpedance$AcidRefluxBRAVO)
BravoHRMAndImpedance$Hiatalhernia<-ifelse(BravoHRMAndImpedance$Hiatalhernia=="Yes",1,0)
BravoHRMAndImpedance$AcidRefluxBRAVO<-as.numeric(BravoHRMAndImpedance$AcidRefluxBRAVO)



#3. Get some basic descriptive stats:
rr<-describe(BravoHRMAndImpedance)

#4. Get rid of bad outliers

#5. Transform variables

#Select HRM only. Also get rid of DistalLESfromnarescm,ProximalLESfromnarescm,EsophageallengthLESUEScenterscm as these are colinear and reflect length of oesophagus altogether)

#4. Variable selection- choose HRM variables only here. Got rid of BasalrespiratorymeanmmHg,ContractileFrontVelocity,HiatusHernia and most oesophageal length measurements because of colinearity concerns with other variables
NegImpALL_BRAVO_WithImpAndHRMNoMissingFinal<-NegImpALL_BRAVO_WithImpAndHRMNoMissing %>%
  ungroup()%>%
  select(

BasalrespiratoryminmmHg,


largebreaks,

AcidRefluxBRAVO)

#Check for multicolinearity
library("PerformanceAnalytics")
chart.Correlation(NegImpALL_BRAVO_WithImpAndHRMNoMissingFinal, histogram=TRUE, pch=1)
#Exploratory data analysis:
create_report(NegImpALL_BRAVO_WithImpAndHRMNoMissingFinal)

#Consider changing variables and repeat step 4

library(MASS)
# Fit the full model 
data<-na.omit(NegImpALL_BRAVO_WithImpAndHRMNoMissingFinal)
full.model <- glm(AcidRefluxBRAVO ~ ., data = data)
# Stepwise regression model
step.model <- stepAIC(full.model, direction = "both", 
                      trace = FALSE)

summary(step.model)

library(pROC)
predpr <- predict(full.model,type=c("response"))
roccurve <- roc(AcidRefluxBRAVO ~ predpr, data = data)
plot(roccurve)
auc(roccurve)

detach("package:MASS", unload=TRUE)

#Univariate analysis:
NegImpPosBRAVOFinal<-NegImpALL_BRAVO_WithImpAndHRMNoMissingFinal[NegImpALL_BRAVO_WithImpAndHRMNoMissingFinal$AcidRefluxBRAVO==1,]
NegImpNegBRAVOFinal<-NegImpALL_BRAVO_WithImpAndHRMNoMissingFinal[NegImpALL_BRAVO_WithImpAndHRMNoMissingFinal$AcidRefluxBRAVO==0,]


#Univariate analysis
t.test(NegImpPosBRAVOFinal$BasalrespiratoryminmmHg ,NegImpNegBRAVOFinal$BasalrespiratoryminmmHg,paired=FALSE)
t.test(NegImpPosBRAVOFinal$prematurecontraction,NegImpNegBRAVOFinal$prematurecontraction,paired=FALSE)
t.test(NegImpPosBRAVOFinal$Distallatency,NegImpNegBRAVOFinal$Distallatency,paired=FALSE)
t.test(NegImpPosBRAVOFinal$largebreaks,NegImpNegBRAVOFinal$largebreaks,paired=FALSE)

#Use the data set loaded above

library(psych)
#Only select the numerical variables

#4. Get rid of bad outliers

#5. Transform variables


#Select HRM only. Also get rid of DistalLESfromnarescm,ProximalLESfromnarescm,EsophageallengthLESUEScenterscm as these are colinear and reflect length of oesophagus altogether)

#4. Variable selection- choose HRM variables only here. Got rid of BasalrespiratorymeanmmHg,ContractileFrontVelocity,HiatusHernia and most oesophageal length measurements because of colinearity concerns with other variables

#Variables removed: MainReflxSettingsAnalysisDurationUpright(as is a 24 hour clock) 
#MainAcidCompositeScorePatientValueRecumbentTimeInReflux,
#MainAcidCompositeScorePatientValueEpisodesOver5min,
#MainAcidCompositeScorePatientValueLongestEpisode,
#MainAcidCompositeScorePatientValueTotalEpisodes,
#MainAcidCompositeScorePatientScoreTotalEpisodes,
#MainAcidCompositeScorePatientScoreUprightTimeInReflux,
#MainAcidCompositeScorePatientScoreRecumbentTimeInReflux,
#MainAcidCompositeScorePatientScoreTotalTimeInReflux,
#MainAcidCompositeScorePatientScoreEpisodesOver5min,
#MainAcidCompositeScorePatientScoreLongestEpisode,
#MainAcidCompositeScorePatientValueUprightTimeInReflux,(all the composite scores are likely to be related to the other scores anyway)
#MainPPPostprandExpoAnalysisDurationUpright,
#MainPPPostprandExpoAnalysisDurationRecumbent,
#MainPPPostprandExpoAnalysisDurationTotal,
#MainReflxSettingsAnalysisDurationRecumbent,
#MainReflxSettingsAnalysisDurationTotal,(all the duration are 24 hour clock times)
#MainAcidExpUprightGastricChannelTimepHLessThan40, (gastric channel unlikely to be relevant)
#MainAcidExpRecumbentGastricChannelTimepHLessThan40, (gastric channel unlikely to be relevant)

#Postprandial exclude for now:
# MainPPBolusExpoAcidPercentTime,
# MainPPBolusExpoNonacidTime,
# MainPPBolusExpoNonacidPercentTime,
# MainPPBolusExpoAllRefluxTime,
# MainPPBolusExpoAllRefluxPercentTime,
# MainPPBolusExpoMedianBolusClearanceTime,
# PPReflEpisodeActExpoAcid,
# PPReflEpisodeActExpoNonacid,
# PPReflEpisodeActExpoAllReflux,
# MainPPBolusExpoAcidTime,
# MainPPAcidExpoClearanceChannelNumberofAcidEpisodes,
# MainPPAcidExpoClearanceChannelTime,
# MainPPAcidExpoClearanceChannelPercentTime,
# MainPPAcidExpoClearanceChannelMeanAcidClearanceTime,
# MainPPAcidExpoClearanceChannelLongestEpisode,
# MainPPAcidExpoGastricChannelPercentTime,
# MainAcidExpTotalGastricChannelTimepHLessThan40,

# MainRflxEpisodeUprightAcid,
# MainRflxEpisodeRecumbentAcid,
# MainRflxEpisodeTotalAcid,
# MainRflxEpisodeUprightNonacid,
# MainRflxEpisodeRecumbentNonacid,
# MainRflxEpisodeTotalNonacid,
# MainRflxEpisodeUprightAllReflux,
# MainRflxEpisodeRecumbentAllReflux,
# MainRflxEpisodeTotalAllReflux,

#This lot likely to already be represented under PercentTime
# MainAcidExpUprightClearanceChannelTime,
# MainAcidExpRecumbentClearanceChannelTime,
# MainAcidExpTotalClearanceChannelTime,
# MainAcidExpUprightClearanceChannelMeanAcidClearanceTime,
# MainAcidExpRecumbentClearanceChannelMeanAcidClearanceTime,
# MainAcidExpTotalClearanceChannelMeanAcidClearanceTime,

# Colinearity issues:
# MainAcidExpUprightClearanceChannelPercentTime,
# MainAcidExpUprightClearanceChannelNumberofAcidEpisodes,

# Don't seem to be contributing much to the model:
# MainAcidExpRecumbentClearanceChannelLongestEpisode
# MainAcidExpTotalClearanceChannelLongestEpisode

NegImpALL_BRAVO_WithImpNoMissing<- NegImpALL_BRAVO_WithImpAndHRMNoMissing %>% 
    ungroup()%>%
select(
MainAcidExpRecumbentClearanceChannelNumberofAcidEpisodes,
MainAcidExpTotalClearanceChannelNumberofAcidEpisodes,
MainAcidExpRecumbentClearanceChannelPercentTime,
MainAcidExpTotalClearanceChannelPercentTime,
MainAcidExpUprightClearanceChannelLongestEpisode,
MainAcidCompositeScorePatientValueTotalTimeInReflux,
MainAcidCompositeScorePatientValueRecumbentTimeInReflux,
MainAcidCompositeScorePatientValueUprightTimeInReflux,
AcidRefluxBRAVO)

NegImpALL_BRAVO_WithImpNoMissing<-dplyr::select_if(NegImpALL_BRAVO_WithImpNoMissing, is.numeric)

#Check for multicolinearity
library("PerformanceAnalytics")
chart.Correlation(NegImpALL_BRAVO_WithImpNoMissing, histogram=FALSE, pch=1)
cor(NegImpALL_BRAVO_WithImpNoMissing, method = "pearson", use = "complete.obs")

#Exploratory data analysis:
create_report(NegImpALL_BRAVO_WithImpNoMissing)

#Consider changing variables and repeat step 4

#Running multivariate analysis
library(MASS)
# Fit the full model 
data<-na.omit(NegImpALL_BRAVO_WithImpNoMissing)
full.model <- glm(AcidRefluxBRAVO ~ ., data = data)
# Stepwise regression model
step.model <- stepAIC(full.model, direction = "both", trace = FALSE)
summary(step.model)

library(pROC)
predpr <- predict(full.model,type=c("response"))
roccurve <- roc(AcidRefluxBRAVO ~ predpr, data = data)
plot(roccurve)
auc(roccurve)

detach("package:MASS", unload=TRUE)

#Univariate analysis:
NegImpPosBRAVO<-NegImpALL_BRAVO_WithImpNoMissing[NegImpALL_BRAVO_WithImpNoMissing$AcidRefluxBRAVO==1,]
NegImpNegBRAVO<-NegImpALL_BRAVO_WithImpNoMissing[NegImpALL_BRAVO_WithImpNoMissing$AcidRefluxBRAVO==0,]


#3. Check normality
library(ggpubr)
ggqqplot(NegImpPosBRAVO$PIPfromnarescm)


#Univariate analysis
t.test(NegImpPosBRAVO$MainAcidExpUprightClearanceChannelLongestEpisode,NegImpNegBRAVO$MainAcidExpUprightClearanceChannelLongestEpisode,paired=FALSE)

#Get a ROC curve
library(ROCR)
prediction( NegImpALL_BRAVO_WithImpNoMissing$MainAcidExpUprightClearanceChannelLongestEpisode, NegImpALL_BRAVO_WithImpNoMissing$AcidRefluxBRAVO)

perf1 <- performance(pred, "sens", "spec")
plot(perf1)

library(OptimalCutpoints)
NegImpALL_BRAVO_WithImpNoMissing<-data.frame(NegImpALL_BRAVO_WithImpNoMissing,stringsAsFactors = F)
optimal.cutpoints(X="MainAcidCompositeScorePatientValueUprightTimeInReflux",status="AcidRefluxBRAVO",tag.healthy=1,methods = "Youden",data=NegImpALL_BRAVO_WithImpNoMissing, pop.prev = NULL,
control = control.cutpoints(), ci.fit = FALSE, conf.level = 0.95, trace = FALSE)

NegImpALL_BRAVO_WithImpNoMissing$failedChicagoClassification<-as.factor(NegImpALL_BRAVO_WithImpNoMissing$failedChicagoClassification)
NegImpALL_BRAVO_WithImpNoMissing$panesophagealpressurization<-as.factor(NegImpALL_BRAVO_WithImpNoMissing$panesophagealpressurization)
NegImpALL_BRAVO_WithImpNoMissing$largebreaks<-as.factor(NegImpALL_BRAVO_WithImpNoMissing$largebreaks)
NegImpALL_BRAVO_WithImpNoMissing$prematurecontraction<-as.factor(NegImpALL_BRAVO_WithImpNoMissing$prematurecontraction)
NegImpALL_BRAVO_WithImpNoMissing$rapidcontraction<-as.factor(NegImpALL_BRAVO_WithImpNoMissing$rapidcontraction)
NegImpALL_BRAVO_WithImpNoMissing$smallbreaks<-as.factor(NegImpALL_BRAVO_WithImpNoMissing$smallbreaks)

NegImpALL_BRAVO_WithImpNoMissing<- NegImpALL_BRAVO_WithImpAndHRMNoMissing %>%  
  ungroup()%>%
  select(
IntraabdominalLESlengthcm,
 Hiatalhernia,
         LOS_relax,
         LowerOesoph,
BasalrespiratoryminmmHg,
ResidualmeanmmHg,
DistalcontractileintegralmeanmmHgcms,
IntraboluspressureATLESRmmHg,
Distallatency,
failedChicagoClassification,
panesophagealpressurization,
largebreaks,
prematurecontraction,
rapidcontraction,
smallbreaks,
MainAcidExpRecumbentClearanceChannelNumberofAcidEpisodes,
MainAcidExpTotalClearanceChannelNumberofAcidEpisodes,
MainAcidExpRecumbentClearanceChannelPercentTime,
MainAcidExpTotalClearanceChannelPercentTime,
MainAcidExpUprightClearanceChannelLongestEpisode,
MainAcidCompositeScorePatientValueTotalTimeInReflux,
MainAcidCompositeScorePatientValueRecumbentTimeInReflux,
MainAcidCompositeScorePatientValueUprightTimeInReflux,
AcidRefluxBRAVO)


NegImpALL_BRAVO_WithImpNoMissing<-dplyr::select_if(NegImpALL_BRAVO_WithImpNoMissing, is.numeric)


#Check for multicolinearity
library("PerformanceAnalytics")
chart.Correlation(NegImpALL_BRAVO_WithImpNoMissing, histogram=FALSE, pch=1)
cor(NegImpALL_BRAVO_WithImpNoMissing, method = "pearson", use = "complete.obs")

#Exploratory data analysis:
create_report(NegImpALL_BRAVO_WithImpNoMissing)

library(MASS)
# Fit the full model 
data<-na.omit(NegImpALL_BRAVO_WithImpNoMissing)
full.model <- glm(AcidRefluxBRAVO ~ ., data = data)
# Stepwise regression model
step.model <- stepAIC(full.model, direction = "both", trace = FALSE)
summary(step.model)

library(pROC)
predpr <- predict(full.model,type=c("response"))
roccurve <- roc(AcidRefluxBRAVO ~ predpr, data = data)
plot(roccurve)
auc(roccurve)

detach("package:MASS", unload=TRUE)

#Univariate analysis:
NegImpPosBRAVO<-NegImpALL_BRAVO_WithImpNoMissing[NegImpALL_BRAVO_WithImpNoMissing$AcidRefluxBRAVO==1,]
NegImpNegBRAVO<-NegImpALL_BRAVO_WithImpNoMissing[NegImpALL_BRAVO_WithImpNoMissing$AcidRefluxBRAVO==0,]


t.test(NegImpPosBRAVO$Distallatency,NegImpNegBRAVO$Distallatency,paired=FALSE)

sebastiz/PhysiMineR documentation built on Oct. 3, 2023, 3:46 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

sebastiz/PhysiMineR
Functions to extract, clean, manipulate and analyse upper gastrointestinal physiological parameters

In sebastiz/PhysiMineR: Functions to extract, clean, manipulate and analyse upper gastrointestinal physiological parameters

Title and abstract

In patients being investigated for GORD with a negative pH impedance study

who go on to have a BRAVO study

Introduction

Aim

Methods

R Package Documentation

Browse R Packages

We want your feedback!

sebastiz/PhysiMineR Functions to extract, clean, manipulate and analyse upper gastrointestinal physiological parameters

In sebastiz/PhysiMineR: Functions to extract, clean, manipulate and analyse upper gastrointestinal physiological parameters

Title and abstract

In patients being investigated for GORD with a negative pH impedance study

who go on to have a BRAVO study

Introduction

Aim

Methods

R Package Documentation

Browse R Packages

We want your feedback!

sebastiz/PhysiMineR
Functions to extract, clean, manipulate and analyse upper gastrointestinal physiological parameters