scirpy.tl.clonotype_modularity

scirpy.tl.clonotype_modularity(adata, target_col='clone_id', connectivity_key='connectivities', permutation_test='approx', n_permutations=None, key_added='clonotype_modularity', inplace=True, fdr_correction=True, random_state=0)

Identifies clonotypes or clonotype clusters consisting of cells that are more transcriptionally related than expected by chance by computing the Clonotype modularity.

For each clonotype, we compare the number of edges connecting the cells belonging to that clonotype in the transcriptomics neighborhood graph with the number of edges expeced by chance in a subgraph of the same size.

We define the connectivity score as the log2 of the ratio of actual to expected edges. A pseudocount of 1 is added to cope with small subgraphs with 0 expected edges. Intuitively, a clonotype modularity of 1 means that there are twice as many edges in the neighborhood graph than expected by chance.

\[\text{connectivity score} = \log_2 \frac{ |E|_{\text{actual}} + 1 }{ |E|_{\text{expected}} + 1 }\]

For each unique clonotype size, the expected number of edges is derived by randomly sampling n_permutation subgraphs from the transcriptomics neighborhood graph. This background distribution is also used to calculate p-values for the connectivity scores. By choosing permutation_test="approx", a negative binomial distribution is fitted to the background distribution and used to calculate p-values.

The clonotype_modularity function inspired by CoNGA [SGC+21], however, while CoNGA creates “conga clusters” based on cells that share edges in the TCR and transcriptomics neighborhood graph, clonotype_modularity uses predefined clonotype clusters and checks if within those clusters, the transcriptomics neighborhood graph is more connected than expected by chance.

Warning

This is an experimental function that may change in the future

Parameters
adata

annotated data matrix

target_col

Column in adata.obs containing the clonotype annotation.

connectivity_key

Key in`adata.obsp` containing the transcriptomics neighborhood graph connectivity matrix.

permutation_test : {‘approx’, ‘exact’}Literal[‘approx’, ‘exact’] (default: 'approx')

Whether to perform an approximate or exact permutation test. If the approximate permutation test is used, the result of fewer permutations is used to fit a negative binomial distribution, from which p-values are derived.

n_permutations : int | NoneOptional[int] (default: None)

Number of permutations used for the permutations test. Defaults to 1000 for the approx test, and to 10000 for the exact test. Note that for the exact test, the minimum achievable p-values is 1/n.

key_added : str (default: 'clonotype_modularity')

Key under which the result will be stored in adata.obs if inplace is True.

fdr_correction : bool (default: True)

Whether to adjust the p-values for multiple testing using false-discovery-rate (FDR) correction.

random_state : int (default: 0)

random seed for permutation test

Return type

Tuple[Dict[str, float], Dict[str, float]] | NoneOptional[Tuple[Dict[str, float], Dict[str, float]]]

Returns

If inplace is False, returns two dictionaries mapping the clonotype id onto a single modularity score and p-value per clonotype. Otherwise, adds two columns to adata.obs

  • adata.obs["{key_added}"]: the modularity scores for each cell

  • adata.obs["{key_added}_pvalue"] or adata.obs["{key_added}_fdr"] with the raw p-values or false discovery rates, respectively, depending on the value of fdr_correction.

and a dictionary to adata.uns

  • adata.uns["{key_added}]: A dictionary holding the parameters this function was called with.