## Options for Rmarkdown compilation knitr::opts_chunk$set(fig.width = 7, fig.height = 5, fig.align = "center", out.width = "70%", message = FALSE, collapse = TRUE, crop = NULL ## Related to https://stat.ethz.ch/pipermail/bioc-devel/2020-April/016656.html ) ## Time the compilation timeStart <- Sys.time()
plexDIA (@Derks2022-yn) enables the profiling of the proteome of single cells using a multiplexed DIA data acquisition strategy. The pipeline includes the nPOP sample processing protocole (@Leduc2022-cc), with mTRAQ labeling of the samples. The authors used the DIA-NN software (@Demichev2020-rb) to identify and quantify the MS data.
Let's first load the replication package to make use of some helper functions. Those functions are only meant for this replication vignette and are not designed for general use.
library("SCP.replication")
scp
and the plexDIA data analysis workflowThe code provided along with the article can be retrieved from this GitHub repository. The objective of this vignette is to replicate the analysis script while providing standardized, easy-to-read, and well documented code. Therefore, our first contribution is to formalize the data processing into a conceptual flow chart.
knitr::include_graphics("figs/derks2022-workflow.png")
This replication vignette relies on a data framework dedicated to SCP data analysis that combines two Bioconductor classes (@Vanderaa2021-ue):
SingleCellExperiment
class provides an interface to many
cutting edge methods for single-cell analysis QFeatures
class facilitates manipulation and processing of
MS-based quantitative data. The scp
vignette
provides detailed information about the data structure. The scp
package extends the functionality of QFeatures
for single-cell
application. scp
offers a standardized implementation for single-cell
processing methods.
The required packages for running this workflow are listed below.
## Core packages of this workflow library("scp") library("scpdata") library("sva") ## Utility packages for data manipulation and visualization library("tidyverse") library("ggbeeswarm") library("ggrepel") library("patchwork") library("ggpointdensity")
scpdata
and the leduc2022
datasetWe also implemented a data package called scpdata
(@Vanderaa2022-qv).
It distributes
published SCP datasets, such as the derks2022
dataset. The datasets
were downloaded from the data source provided in the publication and
formatted to a QFeatures
object so that it is compatible with our
software. The underlying data storage is based on the ExperimentHub
package that provides a cloud-based storage infrastructure.
The derks2022
dataset is provided at different levels of
processing:
scpdata
. The workflow starts with precursor tables and will generate the processed protein data. The authors provided the DIA-NN output tables and the sample annotation table through a Google Drive repo. Protein data are shared a TXT files. We highly value the effort the authors have made to publicly share all the data generated in their project, from raw files to final expression tables (see the Slavov Lab website).
We formatted the derks2022
dataset following our data framework. The
formatted data can be retrieved from the scpdata
package using the
derks2022()
function. All datasets in scpdata
are called after
the first author and the date of publication.
The derks2022
data combines 3 main datasets: the 100-cell equivalent
samples acquired with a Q-Exactive instrument, the single-cell samples
acquired with a Q-Exactive instrument (we call this dataset qe
), and
the the single-cell samples acquired with a timsTOF-SCP instrument (we
call this dataset tims
). Each dataset contains the DIA-NN main report
data split over each run and the combined MS1 extracted data (see above
for more details). The data also contains the protein data for all
samples as processed by the authors.
(derks <- derks2022())
The datasets are stored in a QFeatures
object. In total, it contains
66 different SingleCellExperiment
objects
that we refer to as assays. Each assay contains expression data
along with feature metadata. Each row in an assay represents a
feature, in this case a precursor or a protein depending on the
assay. Each column in an assay represents a sample. During sample
preparation, 3 samples are pooled using mTRAQ labeling, hence some
assays contain 3 columns corresponding to each measured channel.
Using plot()
, we can have a quick overview of the assays.
plot(derks)
This figure is crowded, you can use plot(derks, interactive = TRUE)
to interactively explore this map. However, we can see that the
different runs in the top of the figure converge to 3 assays that
contain the extracted MS1 precursor signal for the three datasets:
bulk
, tims
and qe
. Finally, the protein table contains data
derived from all three datasets.
The objective of this vignette is to replicate the combined protein data table from the precursor assays following the same data processing workflow as the original study by Derks et al. but using standardized functionality.
We extract the proteins
assay and keep it for later benchmarking.
getWithColData()
extract an assay of interest along with the
associated sample annotations. We then remove the assay from the
S
object for the
remainder of the processing. We apply
this using removeAssays
.
proteins_derks <- getWithColData(derks, "proteins") derks <- removeAssay(derks, "proteins")
Before starting the data processing, we remove the samples that are
not considered by the authors to generate the protein data.
The timsTOF-SCP dataset was acquired with a few bulk
samples diluted to single-cell equivalent. These samples are ending
with "_t_DB"
. The Q-Exactive dataset contains single-cells, but also
a dozen negative controls (encoded as "Neg"
).
table(derks$Celltype, derks$dataset)
We remove those unwanted samples by accessing the information from the
colData
using subsetByColData()
.
derks <- subsetByColData(derks, !grepl("Neg|_t_DB", derks$Celltype))
The first data processing step is to correct for isotopic carry over
from one mTRAQ channel to another. This is performed using
correctIsotopicCarryover()
. The function takes an assay and returns
the same assay with corrected quantifications. Since there are 3
datasets, we will loop over the 3 assays containing the extracted MS1
signal. These corrected assays are added to the QFeatures
object
using addAssay()
and the links between the precursors are added using
for (i in c("bulk", "qe", "tims")) { inputAssay <- paste0(i ,"_prec_extracted") outputAssay <- paste0(i ,"_prec_corrected") x <- correctIsotopicCarryover(derks[[inputAssay]]) derks <- addAssay(derks, x, name = outputAssay) derks <- addAssayLinkOneToOne(derks, from = inputAssay, to = outputAssay) cat("Finished correction for the", i, "dataset\n") }
Precursors are then filtered to control for protein FDR. To perform
this, we will again loop over the 3 datasets as the set of protein that
comply to the FDR threshold is dataset-dependent. We retrieve the
assays that link to extracted signal assay as these contain the FDR
threshold. Once identified, we combine all the available rowData
in
a single table using rbindRowData()
. This table is then filtered to
retrieve the proteins with low FDR and we subset the corrected signal
assay to contain only precursors that map those proteins. Finally, we
restore any lost link between the precursors of corrected assay with
its parent assay using addAssayLink()
.
for (i in c("bulk", "qe", "tims")) { ## Find the assays that contain the FDR (q-value) information repAssays <- assayLink(derks, paste0(i ,"_prec_extracted"))@from ## Combine all rowData in a single table rd <- rbindRowData(derks, repAssays) ## Perform filtering PGs_0.01FDR <- unique(rd$Protein.Group[rd$Lib.PG.Q.Value < 0.01]) ## Apply the filter to the corrected signal assay x <- derks[[paste0(i ,"_prec_corrected")]] ind <- rowData(x)$Protein.Group %in% PGs_0.01FDR derks[[paste0(i ,"_prec_corrected")]] <- x[ind, ] ## AssayLinks are lost when removing proteins, we recreate them derks <- addAssayLink(derks, from = paste0(i ,"_prec_extracted"), to = paste0(i ,"_prec_corrected"), varFrom = "Precursor.Id", varTo = "Precursor.Id") cat("Finished FDR control for the", i, "dataset\n") }
discussion you find this chuk too complicated? Let's discuss this by opening an issue here.
Up to now, we kept the data belonging to each MS run in separate
assays. We now combine all batches into a single assay. This can
easily be done using the joinAssays()
function from the QFeatures
package.
We need to account for a limitation when combining the data.
joinAssays()
will only keep the metadata variables that have the
same value between matching rows. However, some precursors map to one
protein group in one run
and to another protein group in another run. Hence, the protein group is
not constant for all precursors and is removed during joining. It is
important we keep the protein group information in the rowData
since
we will later need it to aggregate precursors to proteins. To avoid
this issue, we replace the problematic precursor to protein group
mappings through a majority vote.
## Generate a list of DataFrames with the information to modify precAssays <- grep("corrected$", names(derks), value = TRUE) ppMap <- rbindRowData(derks, i = precAssays) %>% data.frame %>% group_by(Precursor.Id) %>% ## The majority vote happens here mutate(Protein.Group = names(sort(table(Protein.Group), decreasing = TRUE))[1]) %>% select(Precursor.Id, Protein.Group) %>% filter(!duplicated(Precursor.Id, Protein.Group)) consensus <- lapply(precAssays, function(i) { ind <- match(rowData(derks[[i]])$Precursor.Id, ppMap$Precursor.Id) DataFrame(Protein.Group = ppMap$Protein.Group[ind]) }) ## Name the list names(consensus) <- precAssays ## Modify the rowData rowData(derks) <- consensus
We now join the three datasets in a single assay. This is performed
using joinAssays()
.
(derks <- joinAssays(derks, i = grep("corrected$", names(derks)), name = "prec_corrected"))
The last assay, prec_corrected
contains the data for all 3 datasets.
Next, the authors require less than 60% missing data per cell. We
compute the percent missing data for each column. We store it in the
colData
using derks$ <-
for later use.
derks$pNA <- colMeans(assay(derks[["prec_corrected"]]) == 0, na.rm = TRUE)
In this step, the authors ignore NA
's and only count zeros. This
underestimates the missing data rate per-cell. An alternative is to
include NA's as well, although it may overestimate the missing data
rate when the different acquisition runs or dataset do not contain
overlapping proteins. This is shown by the code chunk below as an
example, but it is not executed and we'll stick to the author's
original procedure.
derks <- zeroIsNA(derks, "prec_corrected") derks$pNA <- nNA(derks, "prec_corrected")$nNAcols$pNA
We plot the percent missingness for each data set.
data.frame(colData(derks)) %>% ggplot() + aes(y = pNA, x = dataset, color = Celltype) %>% geom_beeswarm() + scale_size_manual(values = c(3, 0.8))
Most samples comply to the threshold. We remove the few single-cells
with too many missing data using subsetByColdata()
because we have
stored pNA
in the colData
.
derks <- subsetByColData(derks, derks$pNA < 0.6)
We replace the zero values by NA's using zeroIsNA()
.
derks <- zeroIsNA(derks, "prec_corrected")
The authors then apply a first batch correction by aligning the
precursor means across the different acquisition runs. First, we
extract the combined precursor assay along with the colData using
getWithColData()
and extract the quantification matrix from it using
assay()
sce <- getWithColData(derks, "prec_corrected") m <- assay(sce)
For each acquisition run, we divided each row by its mean.
for (batch in sce$File.Name) { ind <- which(sce$File.Name == batch) m[, ind] <- sweep(m[, ind, drop = FALSE], FUN = "/", MARGIN = 1, STATS = rowMeans(m[, ind, drop = FALSE], na.rm = TRUE)) } assay(sce) <- m
We add the batch corrected values as a new assay using
addAssay()
and keep the links with the previous assay using
addAssayLinkOneToOne()
.
sce$pNA <- NULL ## Remove pNA to avoid later conflicts derks <- addAssay(derks, sce, name = "prec_runnorm") derks <- addAssayLinkOneToOne(derks, from = "prec_corrected", to = "prec_runnorm")
The next step is to apply normalization. The authors perform it at the sample level and the precursor level.
Median normalization per sample is available among the methods from
normalizeSCP()
. It will automatically add a new assay that we call
prec_sampnorm
derks <- normalizeSCP(derks, "prec_runnorm", name = "prec_sampnorm", method = "div.median")
Mean normalization per sample is not available among the methods from
normalizeSCP()
, but can be applied using sweep()
offering more
flexibility.
rowm <- rowMeans(assay(derks[["prec_sampnorm"]]), na.rm = TRUE) derks <- sweep(derks, i = "prec_sampnorm", name = "prec_featnorm", MARGIN = 1, FUN = "/", STATS = rowm)
Up to now, the data was processed at the precursor level. One protein group can be represented by several precursors. For instance, there is one protein group that is represented by over 150 precursors.
hist(table(rowData(derks[["prec_featnorm"]])$Protein.Group), xlab = "Number precursors per protein group", main = "")
We aggregate all precursors belonging to the same protein group using
aggregateFeatures()
. In this case, fun = colMedians
will assign a
protein group abundance as the median precursor abundance.
derks <- aggregateFeatures(derks, i = "prec_featnorm", name = "prot", fcol = "Protein.Group", fun = colMedians, na.rm = TRUE)
After protein agregation, the authors perform again a batch correction round. We use the same approach as above.
sce <- getWithColData(derks, "prot") m <- assay(sce) for (batch in sce$File.Name) { ind <- which(sce$File.Name == batch) m[, ind] <- sweep(m[, ind, drop = FALSE], FUN = "/", MARGIN = 1, STATS = rowMeans(m[, ind, drop = FALSE], na.rm = TRUE)) } assay(sce) <- m sce$pNA <- NULL ## Remove pNA to avoid later conflicts derks <- addAssay(derks, sce, name = "prot_runnorm") derks <- addAssayLinkOneToOne(derks, from = "prot", to = "prot_runnorm")
Again, the batch correction is followed by sample and protein normalization. We use the same approach as above.
derks <- normalizeSCP(derks, "prot_runnorm", name = "prot_sampnorm", method = "div.median") rowm <- rowMeans(assay(derks[["prot_sampnorm"]]), na.rm = TRUE) derks <- sweep(derks, i = "prot_sampnorm", name = "prot_featnorm", MARGIN = 1, FUN = "/", STATS = rowm)
The protein data is log2-transformed using logTransform()
.
derks <- logTransform(derks, i = "prot_featnorm", name = "prot_log", base = 2)
Log-transformation has generated infinite values because zeros were
present in the data. We replace these infinite values and zeros
by NA's using infIsNA()
and zeroIsNA()
, respectively.
derks <- infIsNA(derks, "prot_log") derks <- zeroIsNA(derks, "prot_log")
Although we already performed a first filtering based on data missingness, we further filter on protein and sample missingness.
We filter out proteins with more than 95% missingness. This is
performed using filterNA()
that removes rows (proteins) with a
missingness higher than the given threshold.
derks <- filterNA(derks, i = "prot_log", pNA = 0.95)
We then filter out samples with more than 95% missingness. This is performed using the same procedure as above.
derks$pNA <- colMeans(is.na(assay(derks[["prot_log"]]))) derks <- subsetByColData(derks, derks$pNA < 0.95)
Although we filtered on missing data, the protein data is majorly composed of missing values. The graph below shows the distribution of the proportion missingness in cells. Cells contain on average 65 \% missing values.
data.frame(colData(derks)) %>% ggplot() + aes(y = pNA, x = dataset, color = Celltype) %>% geom_beeswarm()
The datasets containing single-cells have more than 50% missing data
that is imputed by the authors using a custom hierarchical clustering
function. This function is available from SCP.replication
as
imputeKnnSCoPE2()
.
derks <- imputeKnnSCoPE2(derks, i = "prot_log", name = "prot_imp", k = 3)
A final batch correction is applied, this time using the ComBat
algorithm from the sva
package.
sce <- getWithColData(derks, "prot_imp") batch <- sce$Label mod <- model.matrix(~ Celltype, data = colData(sce)) assay(sce) <- ComBat(assay(sce), batch = batch, mod = mod) derks <- addAssay(derks, sce, name = "prot_bc") derks <- addAssayLinkOneToOne(derks, from = "prot_imp", to = "prot_bc")
Again, the batch correction is followed by sample and protein normalization. We use the same approach as above.
derks <- normalizeSCP(derks, "prot_bc", name = "prot_bc_sampnorm", method = "center.median") rowm <- rowMeans(assay(derks[["prot_bc_sampnorm"]]), na.rm = TRUE) derks <- sweep(derks, i = "prot_bc_sampnorm", name = "prot_bc_featnorm", MARGIN = 1, FUN = "-", STATS = rowm)
This last normalization step leads to the final data as processed by Derks et al. We will now compare our results with the results published by the authors.
We first compare the data generated by the authors's orginal script and the data generated by this vignette.
Let's first compare the cells that were retained after processing.
allElements <- union(colnames(proteins_derks), colnames(derks[["prot_bc_featnorm"]])) table(derks2022 = allElements %in% colnames(proteins_derks), scp = allElements %in% colnames(derks[["prot_bc_featnorm"]]))
There is a good agreement between the set of filtered cells after the data processing by the authors and the data processing in this vignette.
Let's now compare the protein groups retained after processing.
allElements <- union(rownames(proteins_derks), rownames(derks[["prot_bc_featnorm"]])) table(derks2022 = allElements %in% rownames(proteins_derks), scp = allElements %in% rownames(derks[["prot_bc_featnorm"]]))
Most protein groups are found in both data processing workflows, but the agreement is low.
Let's now compare the quantitative data between the two matrices. To do so, we need to intersect the column names and the rownames.
rows <- intersect(rownames(proteins_derks), rownames(derks[["prot_bc_featnorm"]])) cols <- intersect(colnames(proteins_derks), colnames(derks[["prot_bc_featnorm"]])) err <- assay(proteins_derks)[rows, cols] - assay(derks[["prot_bc_featnorm"]])[rows, cols] data.frame(difference = as.vector(err[!is.na(err)])) %>% ggplot() + aes(x = difference) + geom_histogram(bins = 50) + xlab("plexDIA - scp") + scale_y_continuous(labels = scales::scientific) + theme_minimal()
There are very large differences between the two workflows. Note that
the differences are computed on a log2 scale. However, the mode of the
distribution peaks around 0. These differences may arise because we
had to change some implementations and constraints to use our scp
framework. For instance, we had to re-implement the isotopic carryover
correction using matrix operations instead of working with data.frame
s.
Furthermore, as mentioned when combining the datasets, we had to
perform a consensus vote when mapping precursors to protein groups
across different datasets. These decisions taken at the beginning may
have led to small deviations from the original workflow and may have
propagated to larger deviations throughout the data processing steps.
We here attempt to replicate figures 6b, 6c, 6h and 6i where the authors
show the number of precursors and protein groups that were found for
each cell type in each dataset. We first compute the number of non
missing precursors per sample using countUniqueFeatures()
on the
joined precursor assay, prec_corrected
. The counts are directly
stored in the colData
under nprecs
. Next, we do the same for the
protein groups. countUniqueFeatures()
allows to group the features
by a variable of the rowData
. In this case, we set
groupBy = "Protein.Groups"
. Note that we would get the same results
if we applied countUniqueFeatures
on the prot
assay, or any protein
level assay before imputation.
derks <- countUniqueFeatures(derks, i = "prec_corrected", colDataName = "nprecs") derks <- countUniqueFeatures(derks, i = "prec_corrected", groupBy = "Protein.Group", colDataName = "nprots")
We next retrieve the counts from the `colData' and create the plot. We perform some faceting to display precursor and protein counts for all datasets in a single panel.
df <- data.frame(colData(derks)) ggplot(df) + aes(x = Celltype, y = nprecs, fill = Celltype) + facet_wrap(~ dataset, scales = "free") + labs(y = "Precursors per cell") + ggplot(df) + aes(x = Celltype, y = nprots, fill = Celltype) + facet_wrap(~ dataset, scales = "free") + labs(y = "Protein groups per cell") + plot_layout(guides = "collect", ncol = 1) & geom_boxplot() & geom_beeswarm(alpha = 0.7) & xlab("") & theme_classic() & theme(legend.position = "none")
The plot is in line with the article's figure.
We here attempt to replicate figures 6f and 6l where the authors show the correlation between the PDAC vs monocyte log fold change computed from single-cell samples and from bulk samples. Although the authors devoted a separate data processing to create the plot, we here decide to keep compare single-cells and bulk using the same processing steps. We extract the processed assay, compute the fold changes between PDAC and monoyctes for the three datasets and compute the correlations between the single-cell and bulk log fold changes.
sce <- getWithColData(derks, "prot_bc_featnorm") m <- assay(sce) logFC <- data.frame(PvsU_tims = rowMeans(m[, sce$Celltype == "PDAC_t"], na.rm = TRUE) - rowMeans(m[, sce$Celltype == "U-937_t"], na.rm = TRUE), PvsU_qe = rowMeans(m[, sce$Celltype == "PDAC"], na.rm = TRUE) - rowMeans(m[, sce$Celltype == "U-937"], na.rm = TRUE), PvsU_bulk = rowMeans(m[, sce$Celltype == "PDAC_DB"], na.rm = TRUE) - rowMeans(m[, sce$Celltype == "U-937_DB"], na.rm = TRUE), Protein.Group = rownames(m)) logFC <- left_join(logFC, data.frame(Protein.Group = c("P17096;P17096-3", "P07437", "P08729"), Protein.Name = c("HMGA1", "TUBB", "KRT7")), by = c("Protein.Group")) corQE <- round(cor(logFC$PvsU_qe, logFC$PvsU_bulk, use = "pairwise.complete.obs", method = "spearman"), 2) corTims <- round(cor(logFC$PvsU_tims, logFC$PvsU_bulk, use = "pairwise.complete.obs", method = "spearman"), 2)
We then plot the results as performed in the article.
ggplot(logFC) + aes(x = PvsU_bulk, y = PvsU_qe) + ggtitle("Q-Exactive vs Bulk", subtitle = paste0("rho = ", corQE)) + ggplot(logFC) + aes(x = PvsU_bulk, y = PvsU_tims) + ggtitle("timsTOF-SCP vs Bulk", subtitle = paste0("rho = ", corTims)) + plot_layout() & geom_pointdensity() & labs(x = "Log2 PDAC/U-937 100 cells", y = "Log2 PDAC/U-937 1 cell") & scale_color_continuous(type = "viridis") & theme_classic() & theme(legend.position = "none") & geom_label_repel(aes(label = Protein.Name), min.segment.length = unit(0, 'lines'), box.padding = 2, size = 2.75)
We generated a similar figure as in the article.
We now plot the PCA to reproduce Figure 6p from the author's paper. We
use the pcaSCoPE2()
function that implements the weighted PCA as
suggested in @Specht2021-jm.
m <- assay(derks[["prot_bc_featnorm"]]) pcaRes <- pcaSCoPE2(m) ## Percent of variance explained by each principle component pca_var <- pcaRes$values percent_var<- pca_var/sum(pca_var)*100 ## PCA scores data pcaDf <- data.frame(PC = pcaRes$vectors[, 1:2], colData(derks)) ## Format the sample annotations pcaDf$Celltype[grepl("PDAC", pcaDf$Celltype)] <- "PDAC" pcaDf$Celltype[grepl("Mel", pcaDf$Celltype)] <- "Melanoma" pcaDf$Celltype[grepl("^U", pcaDf$Celltype)] <- "Monocyte" pcaDf$Label <- paste0("mTRAQ", pcaDf$Label)
We then plot the PCA results, colouring the cells either depending on the mTRAQ label used to identify tag-specific effects and depending on the cell type to illustrate the ability of the technology to capture biological variability.
ggplot(pcaDf) + aes(color = Label) + scale_color_manual(values = c("orange2","purple","turquoise")) + ggplot(pcaDf) + aes(color = Celltype) + scale_color_manual(values = c("#52B788","#457B9D","#E07A5F")) + plot_annotation(title = "PCA of the processed data using 'scp'", subtitle = paste0(sum(pcaDf$dataset != "Bulk"), " cells\n", nrow(m), " protein groups")) + plot_layout(guides = "collect") & aes(x = PC.1, y = PC.2, size = dataset, shape = dataset, alpha = dataset) & geom_point(alpha = 0.66) & labs(x = paste0("PC1 (", round(percent_var[1],0),"%)"), y = paste0("PC2 (", round(percent_var[2],0),"%)")) & scale_size_manual(values = c(5, 2, 2)) & theme_classic()
Although the we could not accurately replicate the data provided by the authors, we can clearly see the same global pattern as in the original work.
You can reproduce this vignette using Docker
:
docker pull cvanderaa/scp_replication_docker:v1 docker run \ -e PASSWORD=bioc \ -p 8787:8787 \ cvanderaa/scp_replication_docker:v1
Open your browser and go to http://localhost:8787. The USER is rstudio
and
the password is bioc
. You can find the vignette in the vignettes
folder.
See the website home page for more information.
The system details of the machine that built the vignette are:
sd <- benchmarkme::get_sys_details() cat("Machine: ", sd$sys_info$sysname, " (", sd$sys_info$release, ")\n", "R version: R.", sd$r_version$major, ".", sd$r_version$minor, " (svn: ", sd$r_version$`svn rev`, ")\n", "RAM: ", round(sd$ram / 1E9, 1), " GB\n", "CPU: ", sd$cpu$no_of_cores, " core(s) - ", sd$cpu$model_name, "\n", sep = "")
The total time required to compile this vignette is:
timing <- Sys.time() - timeStart cat(timing[[1]], attr(timing, "units"))
The final leduc
object size is:
format(object.size(derks), units = "GB")
sessionInfo()
This vignette is distributed under a CC BY-SA licence licence.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.