AnnData and MuData
To put it briefly, AnnData objects represent annotated datasets with the main data as a matrix and with rich annotations that might include tables and arrays. MuData objects represent collections of AnnData objects focusing on, but not limited to, scenarios with different AnnData objects representing different sets of features profiled for the same samples.
Originally, both AnnData objects and MuData objects have been implemented in Python.
AnnData
AnnData implementation in Muon.jl tries to mainly follow the reference implementation, albeit there are some differences in how these objects are implemented and behave due to how different languages are designed and opeate.
AnnData objects can be stored in and read from .h5ad files.
Creating AnnData objects
A simple 2D array is already enough to initialize an annotated data object:
x = rand(10, 2) * rand(2, 5);
ad = AnnData(X=x)AnnData object 10 ✕ 5Observations correpond to the rows of the matrix and have unique names:
ad.obs_names .= "obs_" .* ad.obs_names10-element Muon.Index{String, UInt8}:
"obs_1"
"obs_2"
"obs_3"
"obs_4"
"obs_5"
"obs_6"
"obs_7"
"obs_8"
"obs_9"
"obs_10"Corresponding arrays for the observations are stored in the .obsm slot:
f = svd(x);
ad.obsm["X_svd"] = f.U * Diagonal(f.S);10×5 Matrix{Float64}:
-0.250144 0.0346292 -7.50869e-16 -7.4642e-18 -2.33275e-18
-0.623616 0.0887204 -1.28518e-17 -4.86351e-18 5.24508e-18
-0.755096 -0.0622184 -2.67195e-17 2.94538e-17 2.07435e-17
-0.0408174 -0.00529456 4.82673e-19 -7.56459e-20 4.07618e-20
-1.66181 -0.0186622 -9.80952e-18 5.37881e-17 -3.05055e-18
-2.04166 0.0332473 3.92465e-17 -3.65628e-17 1.97402e-18
-1.20915 0.269847 4.76464e-17 5.43713e-17 2.57661e-18
-1.95812 -0.0495714 8.34887e-17 6.60987e-17 -1.21198e-17
-1.50823 -0.16164 -9.86383e-17 5.74088e-17 3.25277e-18
-2.16141 -0.0180843 3.6942e-17 -1.45198e-16 -7.41456e-19When data is assigned, it is verified first that the dimensions match:
Slicing AnnData objects
Just as simple arrays, AnnData objects can be subsetted with slicing operations, with the first dimension corresponding to observations and the second dimension corresponding to variables:
obs_sub = "obs_" .* string.(collect(1:3))
ad_sub = ad[obs_sub,:]AnnData object 3 ✕ 5Since the dimensions are labelled, using names is a natural way to subset these objects but boolean and integer arrays can be used as well:
# both return the same subset
ad_sub[[true,false,true],:]
ad_sub[[1,3],:]AnnData object 2 ✕ 5Operations on AnnData Objects
Make duplicate var_names unique by appending a numbered suffix.
var_names_make_unique!(ad)AnnData object 10 ✕ 5Similarly, we can make obs_names unique also.
obs_names_make_unique!(ad)AnnData object 10 ✕ 5The data matrices of AnnData objects can be converted to a DataFrame, annotated with obs and var names.
using DataFrames
DataFrame(ad)| Row | obs | 1 | 2 | 3 | 4 | 5 |
|---|---|---|---|---|---|---|
| String | Float64 | Float64 | Float64 | Float64 | Float64 | |
| 1 | obs_1 | 0.120051 | 0.0245721 | 0.16205 | 0.0508219 | 0.141111 |
| 2 | obs_2 | 0.298174 | 0.0609995 | 0.404033 | 0.125763 | 0.353667 |
| 3 | obs_3 | 0.440325 | 0.0922786 | 0.486468 | 0.218833 | 0.295128 |
| 4 | obs_4 | 0.0247048 | 0.00519788 | 0.0262652 | 0.0125869 | 0.0144381 |
| 5 | obs_5 | 0.91379 | 0.190246 | 1.07253 | 0.435207 | 0.742309 |
| 6 | obs_6 | 1.09641 | 0.227632 | 1.3186 | 0.512647 | 0.956059 |
| 7 | obs_7 | 0.532422 | 0.107654 | 0.78498 | 0.205468 | 0.762491 |
| 8 | obs_8 | 1.08962 | 0.227162 | 1.26332 | 0.523629 | 0.853027 |
| 9 | obs_9 | 0.896971 | 0.188374 | 0.971068 | 0.451758 | 0.560174 |
| 10 | obs_10 | 1.18562 | 0.246769 | 1.39508 | 0.56362 | 0.970332 |
By default, the first column obs corresponds to the obs_names and the remaining columns are named according to the var_names. To obtain the transpose of this, pass columns=:obs.
To use a different data matrix (the default is ad.X), pass the name of the layer:
DataFrame(ad, layer="raw")MuData
The basic idea behind a multimodal object is key $\rightarrow$ value relationship where keys represent the unique names of individual modalities and values are AnnData objects that contain the correposnding data. Similarly to AnnData objects, MuData objects can also contain rich multimodal annotations.
ad2 = AnnData(X=rand(Binomial(1, 0.3), (10, 7)),
obs_names="obs_" .* string.(collect(1:10)))
md = MuData(mod=Dict("view_rand" => ad, "view_binom" => ad2))MuData object 10 ✕ 12
└ view_rand
AnnData object 10 ✕ 5
└ view_binom
AnnData object 10 ✕ 7Features are considered unique to each modality.
Slicing MuData objects
Slicing now works across all modalities:
md[["obs_1", "obs_9"],:]MuData object 2 ✕ 12
└ view_rand
AnnData object 2 ✕ 5
└ view_binom
AnnData object 2 ✕ 7Multimodal annotation
We can store annotation at the multimodal level, that includes multidimensional arrays:
md.obsm["X_svd"] = f.U * Diagonal(f.S);
md.obsmMuon.AlignedMapping{Tuple{1 => 1}, String, MuData, true, Union{DataFrame, AbstractArray{<:Number}, AbstractArray{Union{Missing, T}} where T<:Number}} with 3 entries:
"X_svd" => [-0.250144 0.0346292 … -7.4642e-18 -2.33275e-18; -0.623616 0.…
"view_rand" => Bool[1; 1; … ; 1; 1;;]
"view_binom" => Bool[1; 1; … ; 1; 1;;]