scirpy.tl.clonotype_modularity
- scirpy.tl.clonotype_modularity(adata, target_col='clone_id', connectivity_key='connectivities', permutation_test='approx', n_permutations=None, key_added='clonotype_modularity', inplace=True, fdr_correction=True, random_state=0)
Identifies clonotypes or clonotype clusters consisting of cells that are more transcriptionally related than expected by chance by computing the Clonotype modularity.
For each clonotype, we compare the number of edges connecting the cells belonging to that clonotype in the transcriptomics neighborhood graph with the number of edges expeced by chance in a subgraph of the same size.
We define the connectivity score as the log2 of the ratio of actual to expected edges. A pseudocount of 1 is added to cope with small subgraphs with 0 expected edges. Intuitively, a clonotype modularity of
1
means that there are twice as many edges in the neighborhood graph than expected by chance.\[\text{connectivity score} = \log_2 \frac{ |E|_{\text{actual}} + 1 }{ |E|_{\text{expected}} + 1 }\]For each unique clonotype size, the expected number of edges is derived by randomly sampling
n_permutation
subgraphs from the transcriptomics neighborhood graph. This background distribution is also used to calculate p-values for the connectivity scores. By choosingpermutation_test="approx"
, a negative binomial distribution is fitted to the background distribution and used to calculate p-values.The
clonotype_modularity
function inspired by CoNGA [SGC+21], however, while CoNGA creates “conga clusters” based on cells that share edges in the TCR and transcriptomics neighborhood graph,clonotype_modularity
uses predefined clonotype clusters and checks if within those clusters, the transcriptomics neighborhood graph is more connected than expected by chance.Warning
This is an experimental function that may change in the future
- Parameters
- adata
annotated data matrix
- target_col
Column in
adata.obs
containing the clonotype annotation.- connectivity_key
Key in`adata.obsp` containing the transcriptomics neighborhood graph connectivity matrix.
- permutation_test : {‘approx’, ‘exact’}
Literal
[‘approx’, ‘exact’] (default:'approx'
) Whether to perform an approximate or exact permutation test. If the approximate permutation test is used, the result of fewer permutations is used to fit a negative binomial distribution, from which p-values are derived.
- n_permutations :
int
|None
Optional
[int
] (default:None
) Number of permutations used for the permutations test. Defaults to
1000
for the approx test, and to10000
for the exact test. Note that for the exact test, the minimum achievable p-values is1/n
.- key_added :
str
(default:'clonotype_modularity'
) Key under which the result will be stored in
adata.obs
if inplace isTrue
.- fdr_correction :
bool
(default:True
) Whether to adjust the p-values for multiple testing using false-discovery-rate (FDR) correction.
- random_state :
int
(default:0
) random seed for permutation test
- Return type
Tuple
[Dict
[str
,float
],Dict
[str
,float
]] |None
Optional
[Tuple
[Dict
[str
,float
],Dict
[str
,float
]]]- Returns
If
inplace
is False, returns two dictionaries mapping the clonotype id onto a single modularity score and p-value per clonotype. Otherwise, adds two columns toadata.obs
adata.obs["{key_added}"]
: the modularity scores for each celladata.obs["{key_added}_pvalue"]
oradata.obs["{key_added}_fdr"]
with the raw p-values or false discovery rates, respectively, depending on the value offdr_correction
.
and a dictionary to
adata.uns
adata.uns["{key_added}]
: A dictionary holding the parameters this function was called with.