{ "cells": [ { "cell_type": "markdown", "id": "b6eae7bd-1091-480f-8c95-551eefe5c53c", "metadata": { "slideshow": { "slide_type": "slide" }, "tags": [] }, "source": [ "# Shadows features" ] }, { "cell_type": "code", "execution_count": 1, "id": "b17e6265-4c91-4d30-a232-20e6a627c07d", "metadata": {}, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": 2, "id": "4aa723fb-6a8d-4d43-913c-a31f2316b02f", "metadata": {}, "outputs": [], "source": [ "import os\n", "os.chdir(\"../../\")" ] }, { "cell_type": "code", "execution_count": 3, "id": "f1c3418a-3a90-41b0-baa6-c6ad340dc75f", "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "data = Path(\"data/\")" ] }, { "cell_type": "markdown", "id": "b9e3bb66-3928-45f4-ba98-fded629de018", "metadata": {}, "source": [ " " ] }, { "cell_type": "markdown", "id": "934b8d69-b812-422f-b718-080bb8508348", "metadata": { "slideshow": { "slide_type": "slide" }, "tags": [] }, "source": [ "## Shadow objects and their features\n", "\n", "While shadow objects provide a convenient read-only drop-in replacement for AnnData/MuData objects when needed, they also have additional features that can help users make the most of *shadows*." ] }, { "cell_type": "markdown", "id": "65462d07-01b0-4395-8891-eda01e472f38", "metadata": {}, "source": [ " " ] }, { "cell_type": "markdown", "id": "4a38075c-8da2-4193-af1a-c52e18176f92", "metadata": { "slideshow": { "slide_type": "fragment" }, "tags": [] }, "source": [ "Import classes for these shadow objects:" ] }, { "cell_type": "code", "execution_count": 4, "id": "079454ed-10dc-47ef-9de2-ef70f95dbed6", "metadata": { "slideshow": { "slide_type": "fragment" }, "tags": [] }, "outputs": [], "source": [ "from shadows import AnnDataShadow, MuDataShadow" ] }, { "cell_type": "markdown", "id": "564f7b2b-063d-4f0e-8333-c178565ee2d2", "metadata": {}, "source": [ " " ] }, { "cell_type": "markdown", "id": "6b819452-470f-47b7-8fa0-0c8304fd557c", "metadata": { "slideshow": { "slide_type": "fragment" }, "tags": [] }, "source": [ "Initialise a multimodal shadow object:" ] }, { "cell_type": "code", "execution_count": 5, "id": "3ff358c0-2c77-460a-97a9-398f615a0e17", "metadata": { "slideshow": { "slide_type": "fragment" }, "tags": [] }, "outputs": [], "source": [ "file = data / \"pbmc5k_citeseq/pbmc5k_citeseq_processed.h5mu\"\n", "mdata = MuDataShadow(file)" ] }, { "cell_type": "markdown", "id": "1747c671-ffc2-4d4d-8a04-7dc44432b2fb", "metadata": {}, "source": [ " " ] }, { "cell_type": "markdown", "id": "b8ae6d73-9a74-48ed-9d41-7e92bfee8f71", "metadata": { "slideshow": { "slide_type": "slide" }, "tags": [] }, "source": [ "### File\n", "\n", "The file connection that the shadow is using can be accessed via the `.file` attribute:" ] }, { "cell_type": "code", "execution_count": 6, "id": "33c47ede-e566-43ac-8596-470263d21b3a", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mdata.file" ] }, { "cell_type": "markdown", "id": "a43127df-c330-4104-bbf6-399c7392c373", "metadata": {}, "source": [ "The name of the file can then be accessed via" ] }, { "cell_type": "code", "execution_count": 7, "id": "a7d549f2-ec47-4744-a744-e2f7884638d7", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'data/pbmc5k_citeseq/pbmc5k_citeseq_processed.h5mu'" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mdata.file.filename" ] }, { "cell_type": "markdown", "id": "0574136f-7aa4-4a1e-9312-eee5fc9c6744", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [] }, "source": [ "The connection stays open until `mdata.close()` is called" ] }, { "cell_type": "code", "execution_count": 8, "id": "1c1f47db-f933-4999-8fae-cb088b56dab5", "metadata": {}, "outputs": [], "source": [ "mdata.close()" ] }, { "cell_type": "markdown", "id": "a87e0e96-86c2-4623-b239-892e92b04a5a", "metadata": { "slideshow": { "slide_type": "fragment" }, "tags": [] }, "source": [ "... or until the file has to be re-opened for modification (see below)." ] }, { "cell_type": "markdown", "id": "5a064df4-b533-4124-a85a-f7b20fcc1091", "metadata": {}, "source": [ " " ] }, { "cell_type": "markdown", "id": "9beb85a9-e226-4b9a-949b-2351432558f7", "metadata": { "slideshow": { "slide_type": "slide" }, "tags": [] }, "source": [ "### Permissions\n", "\n", "We can open HDF5 files in different modes including purely read-only (`'r'`) and read/write (`'r+'`). The mode can be provided to the constructor:" ] }, { "cell_type": "code", "execution_count": 9, "id": "9f297beb-97b5-46ad-97b9-2dedc5c40b53", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'r'" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mdata = MuDataShadow(file, mode=\"r\")\n", "mdata.file.mode" ] }, { "cell_type": "markdown", "id": "fc9da2a5-402f-4fe8-83a2-0a5f06a84d7c", "metadata": {}, "source": [ "Let's add some data to the in-memory shadow object:" ] }, { "cell_type": "code", "execution_count": 10, "id": "21f291bd-7c5d-4ef3-a034-c0030dabdb60", "metadata": { "slideshow": { "slide_type": "fragment" }, "tags": [] }, "outputs": [], "source": [ "mdata[\"rna\"].obsm[\"X_pca_copy\"] = mdata[\"rna\"].obsm[\"X_pca\"].copy()" ] }, { "cell_type": "markdown", "id": "b03108f5-0e8a-4646-af12-ef5fc934885b", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [] }, "source": [ "We can also conveniently close and reopen the connection for a given in-memory shadow object:" ] }, { "cell_type": "code", "execution_count": 11, "id": "e8ddb228-74b4-4f8e-8cdc-c84479f38d2d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'r+'" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mdata.reopen(mode=\"r+\")\n", "mdata.file.mode" ] }, { "cell_type": "markdown", "id": "48157734-adc0-4e7d-8157-64e1201b6fba", "metadata": {}, "source": [ "This way all the newly added elements are still available in memory:" ] }, { "cell_type": "code", "execution_count": 12, "id": "043428b5-dc58-4d0c-b653-e1d8451b39f9", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "obsm:\tX_pcaᐁ, X_umap, X_pca_copy▲" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mdata[\"rna\"].obsm" ] }, { "cell_type": "code", "execution_count": 13, "id": "50aba055-06e2-490d-a1a6-3307ef7ac6d0", "metadata": { "slideshow": { "slide_type": "fragment" }, "tags": [] }, "outputs": [], "source": [ "# Clean up\n", "mdata.close()\n", "del mdata" ] }, { "cell_type": "markdown", "id": "991ccc6a-f182-4689-802d-a9ae70a490e4", "metadata": {}, "source": [ " " ] }, { "cell_type": "markdown", "id": "2dbc52ad-6010-416f-810b-c60e5546ba7b", "metadata": { "slideshow": { "slide_type": "slide" }, "tags": [] }, "source": [ "### Individual modality access\n", "\n", "Individual modalities stored in the .h5mu files can be accessed as part of the `MuDataShadow` object:" ] }, { "cell_type": "code", "execution_count": 14, "id": "d5ea1511-6f1b-4c51-9ec7-14365dc8d391", "metadata": { "slideshow": { "slide_type": "fragment" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "AnnData Shadow object with n_obs × n_vars = 3891 × 17806\n", " X \n", " raw:\tX, var, varm\n", " obs:\t_index, celltype, leiden, n_genes_by_counts, pct_counts_mt, total_counts, total_counts_mt\n", " var:\t_index, dispersions, dispersions_norm, feature_types, gene_ids, highly_variable, mean, mean_counts, means, mt, n_cells_by_counts, pct_dropout_by_counts, std, total_counts\n", " obsm:\tX_pca, X_umap\n", " varm:\tPCs\n", " obsp:\tconnectivities, distances\n", " uns:\tcelltype_colors, hvg, leiden, leiden_colors, neighbors, pca, rank_genes_groups, umap" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mdata = MuDataShadow(file, mode=\"r\")\n", "mdata[\"rna\"]" ] }, { "cell_type": "markdown", "id": "60d08ba8-c7c4-4d13-a5fe-9f39c56dd86a", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [] }, "source": [ "Moreover, one can also create a direct connection to a specific modality:" ] }, { "cell_type": "code", "execution_count": 15, "id": "a853493d-5432-438f-bd8f-837cb63d151a", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "AnnData Shadow object with n_obs × n_vars = 3891 × 17806\n", " X \n", " raw:\tX, var, varm\n", " obs:\t_index, celltype, leiden, n_genes_by_counts, pct_counts_mt, total_counts, total_counts_mt\n", " var:\t_index, dispersions, dispersions_norm, feature_types, gene_ids, highly_variable, mean, mean_counts, means, mt, n_cells_by_counts, pct_dropout_by_counts, std, total_counts\n", " obsm:\tX_pca, X_umap\n", " varm:\tPCs\n", " obsp:\tconnectivities, distances\n", " uns:\tcelltype_colors, hvg, leiden, leiden_colors, neighbors, pca, rank_genes_groups, umap" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mdata.close()\n", "del mdata\n", "\n", "adata = AnnDataShadow(file / \"mod/rna\")\n", "adata" ] }, { "cell_type": "code", "execution_count": 16, "id": "946d03a9-d0d1-4ebc-ae29-92d795f08073", "metadata": { "slideshow": { "slide_type": "fragment" }, "tags": [] }, "outputs": [], "source": [ "# Clean up\n", "adata.close()\n", "del adata" ] }, { "cell_type": "markdown", "id": "14b8ad11-adad-4ea8-9146-3dd7cd9bd415", "metadata": {}, "source": [ " " ] }, { "cell_type": "markdown", "id": "d3ae2a84-34fc-48b9-926e-a5d5f57e4e73", "metadata": { "slideshow": { "slide_type": "slide" }, "tags": [] }, "source": [ "### Class identity\n", "\n", "Many tools in the ecosystem including scanpy frequently check if the input object is an AnnData. For instance, [in `sc.pp.highly_variable_genes`](https://github.com/scverse/scanpy/blob/master/scanpy/preprocessing/_highly_variable_genes.py) it reads:\n", "\n", "```py\n", "if not isinstance(adata, AnnData):\n", " raise ValueError(\n", " '`pp.highly_variable_genes` expects an `AnnData` argument, '\n", " 'pass `inplace=False` if you want to return a `pd.DataFrame`.'\n", " )\n", "```\n", "\n", "In order for shadow objects to be accepted by such functions, they mock their class identity:" ] }, { "cell_type": "code", "execution_count": 17, "id": "f10b98ff-920f-4d46-924f-1cf3074236db", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [] }, "outputs": [], "source": [ "mdata = MuDataShadow(file, mode=\"r\")\n", "\n", "from mudata import MuData\n", "assert isinstance(mdata, MuData), \"mdata is not a valid MuData object\"" ] }, { "cell_type": "code", "execution_count": 18, "id": "7796c156-b84e-46f9-90e4-fe18ad6b91d8", "metadata": { "slideshow": { "slide_type": "fragment" }, "tags": [] }, "outputs": [], "source": [ "from anndata import AnnData\n", "assert isinstance(mdata[\"rna\"], AnnData), \"mdata['rna'] is not a valid AnnData object\"" ] }, { "cell_type": "markdown", "id": "f8e2d4a9-eba2-45c0-88f6-35f69e7d0249", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [] }, "source": [ "Checking for shadow identity still works:" ] }, { "cell_type": "code", "execution_count": 19, "id": "51cd4264-e9d0-4e2c-a536-835a0d3a699d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "isinstance(mdata, MuDataShadow)" ] }, { "cell_type": "code", "execution_count": 20, "id": "efadd4ba-219c-4c84-a1eb-36baf135c82d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "isinstance(mdata[\"rna\"], AnnDataShadow)" ] }, { "cell_type": "code", "execution_count": 21, "id": "a32515de-7866-4229-a639-0818a0dbea3b", "metadata": { "slideshow": { "slide_type": "fragment" }, "tags": [] }, "outputs": [], "source": [ "mdata.close()" ] }, { "cell_type": "markdown", "id": "8d4e683f-0a0b-426c-8cf7-5f5529a844d2", "metadata": {}, "source": [ " " ] }, { "cell_type": "markdown", "id": "c29f18b0-717b-4821-b0f8-e81ca94426de", "metadata": {}, "source": [ "### Backends\n", "\n", "AnnData/MuData are based on a NumPy/Pandas stack. This is the default for the shadow objects in order to provide compatibility with AnnData/MuData objects.\n", "\n", "However the nature of shadow files also simplifies loading individual matrices or tables with alternative backends, e.g. [JAX](https://jax.readthedocs.io/en/latest/_autosummary/jax.numpy.array.html#jax.numpy.array) (`Array`), [PyTorch](https://pytorch.org/docs/stable/tensors.html) (`Tensor`) or [polars](https://pola-rs.github.io/polars/py-polars/html/reference/dataframe/index.html) (`DataFrame`)." ] }, { "cell_type": "code", "execution_count": 22, "id": "734d4e9e-3936-4911-96fe-1bed3de167eb", "metadata": {}, "outputs": [], "source": [ "mdata = MuDataShadow(file, array_backend=\"jax\", table_backend=\"polars\")" ] }, { "cell_type": "code", "execution_count": 23, "id": "3d909ef6-92b7-40f4-b50e-641993469791", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", "shape: (5, 7)\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "_index\n", "\n", "celltype\n", "\n", "leiden\n", "\n", "n_genes_by_counts\n", "\n", "pct_counts_mt\n", "\n", "total_counts\n", "\n", "total_counts_mt\n", "
\n", "object\n", "\n", "cat\n", "\n", "cat\n", "\n", "i32\n", "\n", "f32\n", "\n", "f32\n", "\n", "f32\n", "
\n", "AAACCCAAGAGACAAG-1\n", "\n", ""intermediate m...\n", "\n", ""3"\n", "\n", "2363\n", "\n", "6.332204\n", "\n", "7375.0\n", "\n", "467.0\n", "
\n", "AAACCCAAGGCCTAGA-1\n", "\n", ""CD4+ naïve T"\n", "\n", ""0"\n", "\n", "1259\n", "\n", "9.093319\n", "\n", "3772.0\n", "\n", "343.0\n", "
\n", "AAACCCAGTCGTGCCA-1\n", "\n", ""CD4+ memory T"\n", "\n", ""2"\n", "\n", "1578\n", "\n", "13.178295\n", "\n", "4902.0\n", "\n", "646.0\n", "
\n", "AAACCCATCGTGCATA-1\n", "\n", ""CD4+ memory T"\n", "\n", ""2"\n", "\n", "1908\n", "\n", "6.354415\n", "\n", "6704.0\n", "\n", "426.0\n", "
\n", "AAACGAAAGACAAGCC-1\n", "\n", ""CD14 mono"\n", "\n", ""1"\n", "\n", "1589\n", "\n", "9.307693\n", "\n", "3900.0\n", "\n", "363.0\n", "
\n", "
" ], "text/plain": [ "shape: (5, 7)\n", "┌──────────────┬──────────────┬────────┬──────────────┬──────────────┬──────────────┬──────────────┐\n", "│ _index ┆ celltype ┆ leiden ┆ n_genes_by_c ┆ pct_counts_m ┆ total_counts ┆ total_counts │\n", "│ --- ┆ --- ┆ --- ┆ ounts ┆ t ┆ --- ┆ _mt │\n", "│ object ┆ cat ┆ cat ┆ --- ┆ --- ┆ f32 ┆ --- │\n", "│ ┆ ┆ ┆ i32 ┆ f32 ┆ ┆ f32 │\n", "╞══════════════╪══════════════╪════════╪══════════════╪══════════════╪══════════════╪══════════════╡\n", "│ AAACCCAAGAGA ┆ intermediate ┆ 3 ┆ 2363 ┆ 6.332204 ┆ 7375.0 ┆ 467.0 │\n", "│ CAAG-1 ┆ mono ┆ ┆ ┆ ┆ ┆ │\n", "│ AAACCCAAGGCC ┆ CD4+ naïve T ┆ 0 ┆ 1259 ┆ 9.093319 ┆ 3772.0 ┆ 343.0 │\n", "│ TAGA-1 ┆ ┆ ┆ ┆ ┆ ┆ │\n", "│ AAACCCAGTCGT ┆ CD4+ memory ┆ 2 ┆ 1578 ┆ 13.178295 ┆ 4902.0 ┆ 646.0 │\n", "│ GCCA-1 ┆ T ┆ ┆ ┆ ┆ ┆ │\n", "│ AAACCCATCGTG ┆ CD4+ memory ┆ 2 ┆ 1908 ┆ 6.354415 ┆ 6704.0 ┆ 426.0 │\n", "│ CATA-1 ┆ T ┆ ┆ ┆ ┆ ┆ │\n", "│ AAACGAAAGACA ┆ CD14 mono ┆ 1 ┆ 1589 ┆ 9.307693 ┆ 3900.0 ┆ 363.0 │\n", "│ AGCC-1 ┆ ┆ ┆ ┆ ┆ ┆ │\n", "└──────────────┴──────────────┴────────┴──────────────┴──────────────┴──────────────┴──────────────┘" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "obs = mdata[\"rna\"].obs\n", "print(type(obs))\n", "obs.head()" ] }, { "cell_type": "code", "execution_count": 24, "id": "32286100-13e4-49af-8194-f53693c9b7f0", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] }, { "data": { "text/plain": [ "DeviceArray([[ 20.551052 , 0.36840764, -1.6193684 , ...,\n", " 0.09656975, -0.90912175, -0.77955467],\n", " [ -9.47144 , -5.5212517 , -5.107428 , ...,\n", " 0.64674896, -0.892091 , 1.7873902 ],\n", " [ -9.913012 , 2.766899 , -2.0684972 , ...,\n", " -0.6454743 , 1.615869 , -0.63476324],\n", " ...,\n", " [ -8.727723 , 7.9196725 , 1.3326805 , ...,\n", " 1.4592032 , 0.91210324, 1.3184382 ],\n", " [-10.792531 , 3.2086673 , -2.0437238 , ...,\n", " 1.7311838 , -1.840564 , 1.3253008 ],\n", " [ 20.642431 , 0.49294943, -1.6694897 , ...,\n", " -0.51208967, 0.60652566, -0.75145006]], dtype=float32)" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rna_pca = mdata[\"rna\"].obsm[\"X_pca\"]\n", "print(type(rna_pca))\n", "rna_pca" ] }, { "cell_type": "markdown", "id": "6cdad910-a34c-49d2-bc03-87bfde9417c9", "metadata": {}, "source": [ "When alternative backends are being used, not all of the AnnData/MuData features can be supported, and many external tools might not work as expected as they anticipate NumPy/Pandas objects instead." ] }, { "cell_type": "code", "execution_count": 25, "id": "b06a9071-0443-41e6-ac81-e3f0ce2653e9", "metadata": {}, "outputs": [], "source": [ "# Clean up\n", "mdata.clear_cache()\n", "mdata.close()\n", "del mdata, rna_pca, obs" ] }, { "cell_type": "markdown", "id": "6c474c9e-dfea-406c-ace6-461e8d5438a4", "metadata": {}, "source": [ " " ] }, { "cell_type": "markdown", "id": "16f9b372-a089-4aed-b91e-b368a2ddc13e", "metadata": { "slideshow": { "slide_type": "slide" }, "tags": [] }, "source": [ "### Partial writing\n", "\n", "> [!NOTE]\n", "> This feature is experimental.\n", "\n", "While the main use of the shadows is to provide a low-memory read-only solution to scverse datasets, ability to add new embeddings or other items to the file can greatly extend its usage patterns." ] }, { "cell_type": "code", "execution_count": 9, "id": "02245bc0-cc92-4fe7-b665-a4e2f424b353", "metadata": { "slideshow": { "slide_type": "fragment" }, "tags": [] }, "outputs": [], "source": [ "mdata = MuDataShadow(file, mode=\"r\")" ] }, { "cell_type": "markdown", "id": "c7324f1c-c4a4-4561-9680-0ac5caacc79f", "metadata": {}, "source": [ "Add a new embedding to the in-memory object:" ] }, { "cell_type": "code", "execution_count": 10, "id": "eb6f076f-0b26-428b-a824-a82b3d648c00", "metadata": { "slideshow": { "slide_type": "fragment" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "obsm:\tX_pcaᐁ, X_pca_copyᐁ, X_umap" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mdata[\"rna\"].obsm[\"X_pca_copy\"] = mdata[\"rna\"].obsm[\"X_pca\"].copy()\n", "mdata[\"rna\"].obsm" ] }, { "cell_type": "markdown", "id": "0a7a6374-cb13-4f3a-8f5b-e0c4b4f89363", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [] }, "source": [ "For this, a family of methods is useful, including `.reopen()` and `.write()`. The `.write()` method will only work if the connection is not read-only, e.g. `'r+'`, however it is possible to reopen the file in another mode.\n", "\n", "Internally, `.write()` pushes (`._push_changes()`) the in-memory changes (marked with ▲ in the object representation above) to the file and provides meaningful error messages when the file is not open for writing.\n", "\n", "This separation of concern makes it transparent when the data is modified, and this workflow can be recommended when barely any data are added to the file. As the methods return the shadow itself, it is possible to chain them:" ] }, { "cell_type": "code", "execution_count": 11, "id": "bcfa2982-4bf6-42eb-a604-d17d6496598b", "metadata": { "slideshow": { "slide_type": "fragment" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "obsm:\tX_pcaᐁ, X_pca_copyᐁ, X_umap" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mdata.reopen(mode='r+').write(clear_cache=True).reopen(mode='r'); # clear pushed elements from cache\n", "mdata[\"rna\"].obsm" ] }, { "cell_type": "code", "execution_count": 12, "id": "b03d8f00-6a61-44ec-aa69-fbd01b43c886", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'r'" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mdata.file.mode" ] }, { "cell_type": "code", "execution_count": 13, "id": "1b794d6e-3cf2-4451-9a96-972aec79fc82", "metadata": {}, "outputs": [], "source": [ "mdata.clear_cache()" ] }, { "cell_type": "markdown", "id": "af3d311e-0199-4dcf-b5a5-15b8e446fd08", "metadata": {}, "source": [ " " ] }, { "cell_type": "markdown", "id": "1b128596-dbb5-4469-a346-bd14cda79eb3", "metadata": {}, "source": [ "Default mode is read-only, and it protects the files from being modified while also allowing for multiple connections to the file:" ] }, { "cell_type": "code", "execution_count": 17, "id": "8e817c96-ae69-49d7-a574-58481170f011", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Not available for .write(): File is open in read-only mode. Changes can't be pushed. Reopen it with .reopen('r+') to enable writing.\n" ] } ], "source": [ "try:\n", " mdata.write()\n", "except OSError as e:\n", " print(\"Not available for .write():\", e)" ] }, { "cell_type": "markdown", "id": "2e68cef8-871f-49be-8829-f59ff9d93f99", "metadata": {}, "source": [ " " ] }, { "cell_type": "markdown", "id": "8b5c17b8-98d1-42b6-a008-b3c3b6fbfb79", "metadata": {}, "source": [ "> [!NOTE]\n", "> Partial writing is currently intended to add new elements to the dataset on disk (e.g. a new embedding to .obsm) rather than to modify the dataset and delete or alter existing elements." ] }, { "cell_type": "markdown", "id": "e841d95f-3f46-4902-b18f-eb4c7080e58d", "metadata": {}, "source": [ " " ] }, { "cell_type": "markdown", "id": "e0c11265-8429-4a34-a552-759b1f07a0bc", "metadata": { "tags": [] }, "source": [ "### Views\n", "\n", "Views for shadow objects are conceptually similar to [views in AnnData/MuData](https://anndata.readthedocs.io/en/latest/generated/anndata.AnnData.is_view.html): they provide a view into an existing object without creating its copy.\n", "\n", "As shadow objects inherently operate on the file they are connected to, their views behave slightly differently. Creating a view creates a new connection to the file and returns a new shadow object, which is aware of the part of the data (e.g. which cells) it is supposed to provide a view for." ] }, { "cell_type": "code", "execution_count": 18, "id": "c3ea6e33-128a-48fd-a421-0c9f5801e47d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "View of MuData Shadow object with n_obs × n_vars = 612 × 17838 (original 3891 × 17838)\n", " obs:\t_index, leiden, leiden_wnn, louvain\n", " var:\t_index, feature_types, gene_ids, highly_variable\n", " obsm:\tX_mofa, X_mofa_umap, X_umap, X_wnn_umap, prot, rna\n", " varm:\tLFs, prot, rna\n", " obsp:\tconnectivities, distances, wnn_connectivities, wnn_distances\n", " uns:\tleiden, leiden_wnn_colors, louvain, neighbors, rna:celltype_colors, umap, wnn\n", " obsmap:\tprot, rna\n", " varmap:\tprot, rna\n", " mod:\t2 modalities\n", " prot: 612 x 32\n", " X \n", " layers:\tcounts\n", " obs:\t_index\n", " var:\t_index, feature_types, gene_ids, highly_variable\n", " obsm:\tX_pca, X_umap\n", " varm:\tPCs\n", " obsp:\tconnectivities, distances\n", " uns:\tneighbors, pca, umap\n", " rna: 612 x 17806\n", " X \n", " raw:\tX, var, varm\n", " obs:\t_index, celltype, leiden, n_genes_by_counts, pct_counts_mt, total_counts, total_counts_mt\n", " var:\t_index, dispersions, dispersions_norm, feature_types, gene_ids, highly_variable, mean, mean_counts, means, mt, n_cells_by_counts, pct_dropout_by_counts, std, total_counts\n", " obsm:\tX_pca, X_pca_copy, X_umap\n", " varm:\tPCs\n", " obsp:\tconnectivities, distances\n", " uns:\tcelltype_colors, hvg, leiden, leiden_colors, neighbors, pca, rank_genes_groups, umap" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "monocytes = mdata['rna'].obs['celltype'].values == \"CD14 mono\"\n", "monocytes_view = mdata[monocytes]\n", "monocytes_view" ] }, { "cell_type": "markdown", "id": "2f115798-96d2-4660-889d-b3e9a2d154c3", "metadata": {}, "source": [ "Individual modalities of a MuData Shadow View are sliced accordingly:" ] }, { "cell_type": "code", "execution_count": 19, "id": "13f4b379-e26d-4677-9de3-42b3754af15d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(612, 50)" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "monocytes_view['rna'].obsm[\"X_pca\"].shape" ] }, { "cell_type": "code", "execution_count": 20, "id": "585fcbc6-9d5f-406f-99e1-6b91117e2bac", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "obsm:\tX_pcaᐁ, X_pca_copy, X_umap" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "monocytes_view['rna'].obsm" ] }, { "cell_type": "markdown", "id": "8fbdbb1f-9e35-44aa-aad8-b1f67f827fbd", "metadata": {}, "source": [ "Cache is specific to each view:" ] }, { "cell_type": "code", "execution_count": 21, "id": "d68cc6ea-de8d-4801-9667-4fa059609d85", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "obsm:\tX_pca, X_pca_copy, X_umap" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mdata['rna'].obsm # X_pca is not cached" ] }, { "cell_type": "markdown", "id": "e511214b-52a4-4f63-9275-b267b779ecc9", "metadata": {}, "source": [ "Moreover, this semantic allows to create views of views of views..." ] }, { "cell_type": "code", "execution_count": 22, "id": "229da4ce-df96-45b6-a6a4-4b44ee6749f5", "metadata": {}, "outputs": [], "source": [ "adata = AnnDataShadow(file / \"mod/rna\")" ] }, { "cell_type": "code", "execution_count": 23, "id": "30cbefc7-1e59-447c-8413-de8ef34be30b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "View of AnnData Shadow object with n_obs × n_vars = 7 × 30 (original 3891 × 17806)\n", " X \n", " raw:\tX, var, varm\n", " obs:\t_index, celltype, leiden, n_genes_by_counts, pct_counts_mt, total_counts, total_counts_mt\n", " var:\t_index, dispersions, dispersions_norm, feature_types, gene_ids, highly_variable, mean, mean_counts, means, mt, n_cells_by_counts, pct_dropout_by_counts, std, total_counts\n", " obsm:\tX_pca, X_pca_copy, X_umap\n", " varm:\tPCs\n", " obsp:\tconnectivities, distances\n", " uns:\tcelltype_colors, hvg, leiden, leiden_colors, neighbors, pca, rank_genes_groups, umap" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "view = adata[3:10,:30]\n", "view" ] }, { "cell_type": "code", "execution_count": 24, "id": "bfa15c8a-f4a8-4907-939f-5cb80ef50abc", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "View of AnnData Shadow object with n_obs × n_vars = 2 × 3 (original 3891 × 17806)\n", " X \n", " raw:\tX, var, varm\n", " obs:\t_index, celltype, leiden, n_genes_by_counts, pct_counts_mt, total_counts, total_counts_mt\n", " var:\t_index, dispersions, dispersions_norm, feature_types, gene_ids, highly_variable, mean, mean_counts, means, mt, n_cells_by_counts, pct_dropout_by_counts, std, total_counts\n", " obsm:\tX_pca, X_pca_copy, X_umap\n", " varm:\tPCs\n", " obsp:\tconnectivities, distances\n", " uns:\tcelltype_colors, hvg, leiden, leiden_colors, neighbors, pca, rank_genes_groups, umap" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nested_view = view[:2,-3:]\n", "nested_view" ] }, { "cell_type": "markdown", "id": "6e3ce502-40e6-4b40-b78e-cf86e527bf18", "metadata": {}, "source": [ "Getting attributes from views is no different than for shadow objects:" ] }, { "cell_type": "code", "execution_count": 25, "id": "216d5cd3-5457-4145-952b-61bed2be9f7d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
n_genes_by_countstotal_countstotal_counts_mtpct_counts_mtleidencelltype
AAACCCATCGTGCATA-119086704.0426.06.3544152CD4+ memory T
AAACGAAAGACAAGCC-115893900.0363.09.3076931CD14 mono
\n", "
" ], "text/plain": [ " n_genes_by_counts total_counts total_counts_mt \\\n", "AAACCCATCGTGCATA-1 1908 6704.0 426.0 \n", "AAACGAAAGACAAGCC-1 1589 3900.0 363.0 \n", "\n", " pct_counts_mt leiden celltype \n", "AAACCCATCGTGCATA-1 6.354415 2 CD4+ memory T \n", "AAACGAAAGACAAGCC-1 9.307693 1 CD14 mono " ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nested_view.obs" ] }, { "cell_type": "markdown", "id": "9dbacf34-247e-4ac9-995b-f39656491973", "metadata": {}, "source": [ "... as they are shadow objects themselves:" ] }, { "cell_type": "code", "execution_count": 26, "id": "c0921236-cc65-43fc-a9a1-557d4ab0a1c6", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "shadows.anndatashadow.AnnDataShadow" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(nested_view)" ] }, { "cell_type": "code", "execution_count": 27, "id": "e70179b3-da72-4155-bbf9-b6f9d1fa8d47", "metadata": {}, "outputs": [], "source": [ "# Clean up\n", "nested_view.close()\n", "view.close()\n", "del nested_view, view\n", "\n", "monocytes_view.close()\n", "mdata.close()\n", "del monocytes_view, mdata" ] }, { "cell_type": "markdown", "id": "ed55ed1b-1d8e-4250-9352-75f59cd5551a", "metadata": {}, "source": [ " " ] }, { "cell_type": "markdown", "id": "ab4a745e-df8c-46f5-9c3d-d2d3678fff5f", "metadata": { "slideshow": { "slide_type": "slide" }, "tags": [] }, "source": [ "### Per-feature access to datasets on disk\n", "\n", "This is currently not possible as caching works at the level of individual HDF5 datasets.\n", "\n", "Views may read only the necessary parts of the arrays to memory however this behaviour is currently not universal.\n", "\n", "E.g.:" ] }, { "cell_type": "code", "execution_count": 28, "id": "ff5c4052-0929-43c3-947f-6de72b78d69e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(10, 100)" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata_subset = adata[:10,:100]\n", "adata_subset.X.shape" ] }, { "cell_type": "code", "execution_count": 29, "id": "e410e6e1-34c8-48f5-88b5-a45a0545e342", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "View of AnnData Shadow object with n_obs × n_vars = 10 × 100 (original 3891 × 17806)\n", " X ᐁ \n", " raw:\tX, var, varm\n", " obs:\t_index, celltype, leiden, n_genes_by_counts, pct_counts_mt, total_counts, total_counts_mt\n", " var:\t_index, dispersions, dispersions_norm, feature_types, gene_ids, highly_variable, mean, mean_counts, means, mt, n_cells_by_counts, pct_dropout_by_counts, std, total_counts\n", " obsm:\tX_pca, X_pca_copy, X_umap\n", " varm:\tPCs\n", " obsp:\tconnectivities, distances\n", " uns:\tcelltype_colors, hvg, leiden, leiden_colors, neighbors, pca, rank_genes_groups, umap" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata_subset" ] }, { "cell_type": "code", "execution_count": 30, "id": "bf2a317a-ca82-4a73-b0ef-07d0cfac2128", "metadata": {}, "outputs": [], "source": [ "# Clean up\n", "adata.close()\n", "adata_subset.close()\n", "del adata, adata_subset" ] }, { "cell_type": "markdown", "id": "bb50af6a-4ee2-4a8f-b022-9b0daa63e81e", "metadata": {}, "source": [ " " ] }, { "cell_type": "markdown", "id": "fec4c262-5bbf-4393-b082-f208f7997a7a", "metadata": { "slideshow": { "slide_type": "slide" }, "tags": [] }, "source": [ "---\n", "\n", "In order to return the data to its original state, let's manually remove the items we wrote to the file:" ] }, { "cell_type": "code", "execution_count": 31, "id": "46550ff4-39e1-40e6-80d0-4fd45d99af84", "metadata": { "slideshow": { "slide_type": "fragment" }, "tags": [] }, "outputs": [], "source": [ "import h5py\n", "\n", "f = h5py.File(file, \"a\")\n", "# ^\n", "# ____________|\n", "# if this works, \n", "# no dangling read-only connections!\n", "# \n", "\n", "del f[\"mod/rna/obsm/X_pca_copy\"]\n", "f.close()" ] }, { "cell_type": "markdown", "id": "6bc6a57c-39d0-45ad-be01-8cadde33da83", "metadata": {}, "source": [ " " ] }, { "cell_type": "markdown", "id": "752bd981-1cbd-43ec-b707-9308afb7e55f", "metadata": {}, "source": [ " " ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.11" } }, "nbformat": 4, "nbformat_minor": 5 }