✨ What's new
Version ≥ 0.30.6 (Jun 10, 2026):
gget blat: Improved resilience against UCSC BLAT endpoint failures (fixes intermittently failing tests).- Added retry-with-exponential-backoff for transient failures (HTTP 429/5xx, network errors, and non-JSON 200 responses caused by UCSC rate-limiting or HTML error pages). Up to 4 attempts with 1.5s → 3s → 6s backoff.
- Replaced the misleading "sequence too short or assembly invalid" message with the actual server response (status code, response preview) so failures are diagnosable.
HTTPErrorandURLErrorare now caught explicitly instead of bubbling up as unhandled exceptions.
- Bug fixes:
gget cosmic: Fixed misleading error message when the download step fails — was reporting the previous command's return code/stderr instead of the failing command's.gget cosmic: Narrowed the JSON parse exception handler tojson.JSONDecodeErrorso unrelatedValueErrors are no longer masked by the "Failed to download file" message.gget --version,gget --help,ggetinvoked with no arguments, andgget <module>with no further arguments now all exit with status 0 instead of 1, so CI scripts and shell pipelines no longer treat these informational outputs as failures.- Added request timeouts to previously-unguarded
requestscalls ingget ref,gget info,gget 8cube,gget enrichr, andgget opentargets. Default is 10s connect / 60s read; configurable via the newDEFAULT_REQUESTS_TIMEOUTconstant. - Narrowed a bare
except:inutils.get_uniprot_seqsto(KeyError, IndexError, TypeError)so unrelated errors (includingKeyboardInterrupt) are no longer swallowed. - Added
utils.http_json()andutils.dig()helpers that issue a request and parse JSON / walk a nested response path with consistent error reporting. Migratedgget bgee,gget opentargets, and one.json()callsite ingget virusto use them; remaining modules will migrate opportunistically. Upstream HTML error pages, malformed JSON, and missing response keys now surface as clearRuntimeErrors naming the failing service instead of crypticJSONDecodeError/KeyErrortracebacks. utils.http_json()now retries transient failures (connection errors, read timeouts, HTTP 5xx) up to 3 times with exponential backoff. Smooths over short upstream blips (e.g. bgee.org read timeouts) without affecting 4xx errors, which still raise immediately.gget virus: Replaced 11 bareexcept: passblocks aroundfile.close()/os.remove()cleanup calls with narrowedexcept OSErrorhandlers that log the failure atDEBUG. Previously, real I/O issues during cleanup (disk full, permissions) were silently dropped and the cleanup path also swallowedKeyboardInterrupt.gget cbio: Fixed a code path incbio_plotthat called the removed-in-pandas-2.0DataFrame.append()inside a loop when filling missing CNA genes — the entire branch crashed on modern pandas. It now builds a single DataFrame of missing rows and concatenates once.
- Performance:
utils.get_uniprot_seqs: Collect per-ID DataFrames in a list andpd.concat(..., ignore_index=True)once at the end, avoiding the O(n²) cost of growing a DataFrame inside the request loop.- Cached
utils.find_latest_ens_rel,utils.search_species_options,utils.ref_species_options, andutils.find_nv_kingdomwithfunctools.lru_cache. These hit Ensembl FTP listings that are stable for a release; repeated calls within one Python process are now free. - Added
utils.parallel_map, a thinThreadPoolExecutorwrapper for I/O-bound work. Used to fan oututils.get_uniprot_seqsacross the input ID list — looking up N IDs is now bounded by ~N / pool_sizeUniProt round-trips instead ofN. Pool size defaults to 8 and can be overridden via theGGET_MAX_WORKERSenvironment variable.
Version ≥ 0.30.5 (May 23, 2026):
gget opentargets: Rewrote this module to reflect the new Open Targets API structure- some output column/key names may differ to reflect the new API structure
- Removed the
--filter_modeargument
gget blast: Fixed compatibility with newer pandas versions (≥ 2.0) wherepd.read_html()no longer accepts raw HTML strings directly, causing aFileNotFoundError/OSError: Filename too longerror when parsing BLAST resultsgget cosmic: Added overwrite and gzip arguments to internals.
Version ≥ 0.30.3 (Feb 26, 2026):
gget virus: New filtering options, quiet mode, and improved download reliability- Added
--segmentfilter for segmented viruses (e.g., Influenza A segments like 'HA', 'NA', 'PB1') - Added
--vaccine_strainfilter to include or exclude vaccine strain sequences - Added
--source_databasefilter to select sequences from 'genbank' or 'refseq' (replacesrefseqOnly) - Added
-q/--quietflag to suppress progress information - Extended fallback strategies for improved download reliability on large datasets
- Command summary file now includes software version
- Added
Version ≥ 0.30.2 (Feb 08, 2026):
gget virus: Metadata streaming optimization, improved protein filtering, and enhanced error handling and retry logic- Metadata now streams to disk during fetch to prevent memory exhaustion on large datasets (100,000+ records)
- Fixed metadata CSV mapping (camelCase → snake_case) for organism name, host, and collection date
- Enhanced protein filtering for segmented viruses with improved FASTA header parsing
- Added
annotated=Falseoption for filtering unannotated sequences - Added progress bars to batched sequence downloads
- Fixed collection date naming bug
- Improved error messages for invalid filter dates
- Added enhanced retry attempts for virus name resolution
- Added verbosity to influenza A and COVID-19 checking steps
Version ≥ 0.30.0 (Jan 19, 2026):
- NEW MODULES:
- SECURITY IMPROVEMENTS:
- Replaced
os.system()with f-strings containing URLs from external APIs ingget/main.py - Replaced
exec()withimportlib.import_module()ingget setupfor safer dynamic imports - Replaced
shell=Truesubprocess calls with list-based arguments ingget muscle,gget diamond, andgget setupto prevent command injection
- Replaced
Version ≥ 0.29.3 (Sep 11, 2025):
gget blat: Updated API request to new permissions.gget pdb: Added wwpdb mirror; falls back to rcsb if wwpdb fails.gget cellxgene: Improved argument handling; frontend unchanged. Fixes issue 181.gget setup/gget alphafold: Fixed pip_cmd bug in gget.setup("alphafold")
Version ≥ 0.29.2 (Jul 03, 2025):
- gget can now be installed using
uv pip install gget- All package metadata (version, author, description, etc.) is now managed in setup.cfg for full compatibility with modern tools like uv, pip, and PyPI
- gget now uses a minimal setup.py and is fully PEP 517/518 compatible
gget setupwill now try to useuv pip installfirst for speed and modern dependency resolution, and fall back ontopip installif uv fails or is not available- Users are informed at each step which installer is being used and if a retry is happening
- Note: Some scientific dependencies (e.g., cellxgene-census) may not yet support Python 3.12. If you encounter installation errors, try using Python 3.9 or 3.10. (The pip installation might also still succeed in these cases.)
- All required dependencies are now listed in setup.cfg under install_requires -> Installing gget with
pip install .oruv pip install .will automatically install all dependencies
Version ≥ 0.29.1 (Apr 21, 2025):
gget mutate:- gget mutate has been simplified to focus on taking as input a list of mutations and associated reference genome with corresponding annotation information, and produce as output the sequences with the mutation incorporated and a short region of surrounding context. For the full functionality of the previous version and how it integrates in the context of a novel variant screening pipeline, visit the varseek repository being developed by members of the gget team at https://github.com/pachterlab/varseek.git.
- Added additional information to returned data frames as described here: https://github.com/pachterlab/gget/pull/169
gget cosmic:- Major restructuring of the
gget cosmicmodule to adhere to new login requirements set by COSMIC - New arguments
emailandpasswordwere added to allow the user to manually enter their login credentials without required input for data download - Default changed:
gget_mutate=False - Deprecated argument:
entity - Argument
mutation_classis nowcosmic_project
- Major restructuring of the
gget bgee:type="orthologs"is now the default, removing the need to specify thetypeargument when calling orthologs- Allow querying multiple genes at once.
gget diamond:- Now supports translated alignment of nucleotide sequences to amino acid reference sequences using the
--translatedflag.
- Now supports translated alignment of nucleotide sequences to amino acid reference sequences using the
gget elm:- Improved server error handling.
Version ≥ 0.29.0 (Sep 25, 2024):
- New modules:
gget enrichrnow also supports species other than human and mouse (fly, yeast, worm, and fish) via modEnrichRgget mutate:
gget mutatewill now merge identical sequences in the final file by default. Mutation creation was vectorized to decrease runtime. Improved flanking sequence check for non-substitution mutations to make sure no wildtype kmer is retained in the mutation-containing sequence. Addition of several new arguments to customize sequence generation and output.gget cosmic:
Added support for targeted as well as gene screens. The CSV file created for gget mutate now also contains protein mutation info.gget ref:
Added out file option.gget infoandgget seq:
Switched to Ensembl POST API to increase speed (nothing changes in front end).- Other "behind the scenes" changes:
- Unit tests reorganized to increase speed and decrease code
- Requirements updated to allow newer mysql-connector versions
- Support Numpy>= 2.0
Version ≥ 0.28.6 (Jun 2, 2024):
- New module:
gget mutate gget cosmic: You can now download entire COSMIC databases using the argumentdownload_cosmicargumentgget ref: Can now fetch the GRCh37 genome assembly usingspecies='human_grch37'gget search: Adjust access of human data to the structure of Ensembl release 112 (fixes issue 129)
Version ≥ 0.28.5 (May 29, 2024):
- Yanked due to logging bug in
gget.setup("alphafold")+ inversion mutations ingget mutateonly reverse the string instead of also computing the complementary strand
Version ≥ 0.28.4 (January 31, 2024):
gget setup: Fix bug with filepath when runninggget.setup("elm")on Windows OS.
Version ≥ 0.28.3 (January 22, 2024):
gget searchandgget refnow also support fungi 🍄, protists 🌝, and invertebrate metazoa 🐝 🐜 🐌 🐙 (in addition to vertebrates and plants)- New module:
gget cosmic gget enrichr: Fix duplicate scatter dots in plot when pathway names are duplicatedgget elm:- Changed ortho results column name 'Ortholog_UniProt_ID' to 'Ortholog_UniProt_Acc' to correctly reflect the column contents, which are UniProt Accessions. 'UniProt ID' was changed to 'UniProt Acc' in the documentation for all
ggetmodules. - Changed ortho results column name 'motif_in_query' to 'motif_inside_subject_query_overlap'.
- Added interaction domain information to results (new columns: "InteractionDomainId", "InteractionDomainDescription", "InteractionDomainName").
- The regex string for regular expression matches was encapsulated as follows: "(?=(regex))" (instead of directly passing the regex string "regex") to enable capturing all occurrences of a motif when the motif length is variable and there are repeats in the sequence (https://regex101.com/r/HUWLlZ/1).
- Changed ortho results column name 'Ortholog_UniProt_ID' to 'Ortholog_UniProt_Acc' to correctly reflect the column contents, which are UniProt Accessions. 'UniProt ID' was changed to 'UniProt Acc' in the documentation for all
gget setup: Use theoutargument to specify a directory the ELM database will be downloaded into. Completes this feature request.gget diamond: The DIAMOND command is now run with--ignore-warningsflag, allowing niche sequences such as amino acid sequences that only contain nucleotide characters and repeated sequences. This is also true for DIAMOND alignments performed withingget elm.gget refandgget searchback-end change: the current Ensembl release is fetched from the new release file on the Ensembl FTP site to avoid errors during uploads of new releases.gget search:- FTP link results (
--ftp) are saved in txt file format instead of json. - Fix URL links to Ensembl gene summary for species with a subspecies name and invertebrates.
- FTP link results (
gget ref:- Back-end changes to increase speed
- New argument:
list_iv_speciesto list all available invertebrate species (can be combined with thereleaseargument to fetch all species available from a specific Ensembl release)
Version ≥ 0.28.2 (November 15, 2023):
gget info: Return a logging error message when the NCBI server fails for a reason other than a fetch fail (this is an error on the server side rather than an error withgget)- Replace deprecated 'text' argument to find()-type methods whenever used with dependency
BeautifulSoup gget elm: Remove false positive and true negative instances from returned resultsgget elm: Addexpandargument
Version ≥ 0.28.0 (November 5, 2023):
- Updated documentation of
gget muscleto add a tutorial on how to visualize sequences with varying sequence name lengths + slight change to returned visualization so it's a bit more robust to varying sequence names gget musclenow also allows a list of sequences as input (as an alternative to providing the path to a FASTA file)- Allow missing gene filter for
gget cellxgene(fixes bug) gget seq: Allow missing gene names (fixes https://github.com/pachterlab/gget/issues/107)gget enrichr: Use new argumentskegg_outandkegg_rankto create an image of the KEGG pathway with the genes from the enrichment analysis highlighted (thanks to this PR by Noriaki Sato)- New modules:
gget elmandgget diamond
Version ≥ 0.27.9 (August 7, 2023):
gget enrichr: Use new argumentbackground_listto provide a list of background genesgget searchnow also searches Ensembl synonyms (in addition to gene descriptions and names) to return more comprehensive search results (thanks to Samuel Klein for the suggestion)
Version ≥ 0.27.8 (July 12, 2023):
gget search: Specify the Ensembl release from which information is fetched with new argument-r--release- Fixed bug in
gget pdb(this bug was introduced in version 0.27.5)
Version ≥ 0.27.7 (May 15, 2023):
- Moved dependencies for modules
gget gptandgget cellxgenefrom automatically installed requirements togget setup. - Updated
gget alphafolddependencies for compatibility with Python >= 3.10. - Added
census_versionargument togget cellxgene.
Version ≥ 0.27.6 (May 1, 2023) (YANKED due to problems with dependencies -> replaced with version 0.27.7):
- Thanks to PR by Tomás Di Domenico:
gget searchcan now also query plant 🌱 Ensembl IDs. - New module:
gget cellxgene
Version ≥ 0.27.5 (April 6, 2023):
- Updated
gget searchto function correctly with new Pandas version 2.0.0 (released on April 3rd, 2023) as well as older versions of Pandas - Updated
gget infowith new flagsuniprotandncbiwhich allow turning off results from these databases independently to save runtime (note: flagensembl_onlywas deprecated) - All gget modules now feature a
-q / --quiet(Python:verbose=False) flag to turn off progress information
Version ≥ 0.27.4 (March 19, 2023):
- New module:
gget gpt
Version ≥ 0.27.3 (March 11, 2023):
gget infoexcludes PDB IDs by default to increase speed (PDB results can be included using flag--pdb/pdb=True).
Version ≥ 0.27.2 (January 1, 2023):
- Updated
gget alphafoldto DeepMind's AlphaFold v2.3.0 (including new argumentsmultimer_for_monomerandmultimer_recycles)
Version ≥ 0.27.0 (December 10, 2022):
- Updated
gget alphafoldto match recent changes by DeepMind - Updated version number to match gget's creator's age following a long-standing Pachter lab tradition
Version ≥ 0.3.13 (November 11, 2022):
- Reduced runtime for
gget enrichrandgget archs4when used with Ensembl IDs
Version ≥ 0.3.12 (November 10, 2022):
gget infonow also returns subcellular localisation data from UniProt- New
gget infoflagensembl_onlyreturns only Ensembl results - Reduced runtime for
gget infoandgget seq
Version ≥ 0.3.11 (September 7, 2022):
- New module:
gget pdb
Version ≥ 0.3.10 (September 2, 2022):
gget alphafoldnow also returns pLDDT values for generating plots from output without rerunning the program (also see the gget alphafold FAQ)
Version ≥ 0.3.9 (August 25, 2022):
- Updated openmm installation instructions for
gget alphafold
Version ≥ 0.3.8 (August 12, 2022):
- Fixed mysql-connector-python version requirements
Version ≥ 0.3.7 (August 9, 2022):
- NOTE: The Ensembl FTP site changed its structure on August 8, 2022. Please upgrade to
ggetversion ≥ 0.3.7 if you usegget ref
Version ≥ 0.3.5 (August 6, 2022):
- New module:
gget alphafold
Version ≥ 0.2.6 (July 7, 2022):
gget refnow supports plant genomes! 🌱
Version ≥ 0.2.5 (June 30, 2022):
- NOTE: UniProt changed the structure of their API on June 28, 2022. Please upgrade to
ggetversion ≥ 0.2.5 if you use any of the modules querying data from UniProt (gget infoandgget seq).
Version ≥ 0.2.3: (June 26, 2022):
- JSON is now the default output format for the command-line interface for modules that previously returned data frame (CSV) format by default (the output can be converted to data frame/CSV using flag
[-csv][--csv]). Data frame/CSV remains the default output for Jupyter Lab / Google Colab (and can be converted to JSON withjson=True). - For all modules, the first required argument was converted to a positional argument and should not be named anymore in the command-line, e.g.
gget ref -s human→gget ref human. gget info:[--expand]is deprecated. The module will now always return all of the available information.- Slight changes to the output returned by
gget info, including the return of versioned Ensembl IDs. gget infoandgget seqnow support 🪱 WormBase and 🪰 FlyBase IDs.gget archs4andgget enrichrnow also take Ensembl IDs as input with added flag[-e][--ensembl](ensembl=Truein Jupyter Lab / Google Colab).gget seqargumentseqtypewas replaced by flag[-t][--translate](translate=True/Falsein Jupyter Lab / Google Colab) which will return either nucleotide (False) or amino acid (True) sequences.gget searchargumentseqtypewas renamed toid_typefor clarity (still taking the same arguments 'gene' or 'transcript').