Description Usage Arguments Details Value Examples

View source: R/segmentExpression2CopyNumber.R

Maps single cell expression profiles to copy number profiles.

1 2 | ```
segmentExpression2CopyNumber(eps, gpc, cn, seed=0, outF=NULL, maxPloidy=8,
nCores=2, stdOUT="log.applyAR2seg")
``` |

`eps` |
Segment-by-cell matrix of expression. |

`gpc` |
Number of genes expressed per cell. |

`cn` |
Average copy number across cells for each segment (i.e. row in eps). |

`seed` |
The fraction of entries in a-priori segment-by-cell copy number matrix to be used as seed for association rule mining. |

`outF` |
Output file prefix in which to print intermediary heatmaps and histograms, or NULL (default) if no print. |

Let S := { *S_1, S_2, ... S_n* } be the set of *n* genomic segments obtained from bulk DNA-sequencing. The segment-by-cell expression matrix is first normalized by gene coverage. Let *EN_{ij}* and *G_{ij}* be the average number of UMIs and the number of expressed genes per segment *j* per cell *i*.
The linear regression model:

*EN_{*x} \sim ∑_{j \in S}G_{*j} *

, fits the average segment expression per cell onto the cell's overall expression, for each *x \in S*. The model’s residuals *R_{ij}* reflect inter-cell differences in expression per segment that cannot be explained by differential gene coverage per cell. A first approximation of the cell-by-segment copy number matrix CN is given by:

*CN_{ij} := R_{ij} * (cn_j / μ_j )*

, where *μ_j = mean_x(R_{xj})*, is the mean residual per segment across cells and *cn_j* is the population-average copy number of segment j derived from DNA-seq.

Above transformation of *EN_{ij}* into *CN_{ij}* is in essence a numerical optimization, shifting the distribution of each segment to the average value expected from bulk DNA-seq.

Let *x’ \in CN* be the measured copy number of a given cell-segment pair, and *x* its corresponding true copy number state. Further, let CNF be the matrix of assigned copy number states per segment per cell. The probability of assigning copy number *x* to a cell *i* at locus *j* depends on:

**A. Cell i's read count at locus j**, calculated conditional on the measurement

, the cumulative confidence of the rules in support of

We first assign

*CNF_{ij}:=argmax_{x \in [1,8]} P_A(x|x') + P_B(x) *

Segment-by-cell matrix of copy number states.

1 2 3 4 5 6 7 8 9 10 11 | ```
##Calculate number of genes expressed per each cell:
data(epg)
gpc = apply(epg>0, 2, sum)
##Call function:
data(eps)
data(segments)
cn=segments[rownames(eps),"CN_Estimate"]
cnps = segmentExpression2CopyNumber(eps, gpc, cn, seed=0.5, nCores=2, stdOUT="log")
head(eps[,1:5]); ##Expression of first five cells
head(cnps[,1:5]); ##Copy number of first five cells
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.