View page source on GitHub

✨ What's new

Version ≥ 0.30.6 (Jun 10, 2026):

  • gget blat: Improved resilience against UCSC BLAT endpoint failures (fixes intermittently failing tests).
    • Added retry-with-exponential-backoff for transient failures (HTTP 429/5xx, network errors, and non-JSON 200 responses caused by UCSC rate-limiting or HTML error pages). Up to 4 attempts with 1.5s → 3s → 6s backoff.
    • Replaced the misleading "sequence too short or assembly invalid" message with the actual server response (status code, response preview) so failures are diagnosable.
    • HTTPError and URLError are now caught explicitly instead of bubbling up as unhandled exceptions.
  • Bug fixes:
    • gget cosmic: Fixed misleading error message when the download step fails — was reporting the previous command's return code/stderr instead of the failing command's.
    • gget cosmic: Narrowed the JSON parse exception handler to json.JSONDecodeError so unrelated ValueErrors are no longer masked by the "Failed to download file" message.
    • gget --version, gget --help, gget invoked with no arguments, and gget <module> with no further arguments now all exit with status 0 instead of 1, so CI scripts and shell pipelines no longer treat these informational outputs as failures.
    • Added request timeouts to previously-unguarded requests calls in gget ref, gget info, gget 8cube, gget enrichr, and gget opentargets. Default is 10s connect / 60s read; configurable via the new DEFAULT_REQUESTS_TIMEOUT constant.
    • Narrowed a bare except: in utils.get_uniprot_seqs to (KeyError, IndexError, TypeError) so unrelated errors (including KeyboardInterrupt) are no longer swallowed.
    • Added utils.http_json() and utils.dig() helpers that issue a request and parse JSON / walk a nested response path with consistent error reporting. Migrated gget bgee, gget opentargets, and one .json() callsite in gget virus to use them; remaining modules will migrate opportunistically. Upstream HTML error pages, malformed JSON, and missing response keys now surface as clear RuntimeErrors naming the failing service instead of cryptic JSONDecodeError / KeyError tracebacks.
    • utils.http_json() now retries transient failures (connection errors, read timeouts, HTTP 5xx) up to 3 times with exponential backoff. Smooths over short upstream blips (e.g. bgee.org read timeouts) without affecting 4xx errors, which still raise immediately.
    • gget virus: Replaced 11 bare except: pass blocks around file.close() / os.remove() cleanup calls with narrowed except OSError handlers that log the failure at DEBUG. Previously, real I/O issues during cleanup (disk full, permissions) were silently dropped and the cleanup path also swallowed KeyboardInterrupt.
    • gget cbio: Fixed a code path in cbio_plot that called the removed-in-pandas-2.0 DataFrame.append() inside a loop when filling missing CNA genes — the entire branch crashed on modern pandas. It now builds a single DataFrame of missing rows and concatenates once.
  • Performance:
    • utils.get_uniprot_seqs: Collect per-ID DataFrames in a list and pd.concat(..., ignore_index=True) once at the end, avoiding the O(n²) cost of growing a DataFrame inside the request loop.
    • Cached utils.find_latest_ens_rel, utils.search_species_options, utils.ref_species_options, and utils.find_nv_kingdom with functools.lru_cache. These hit Ensembl FTP listings that are stable for a release; repeated calls within one Python process are now free.
    • Added utils.parallel_map, a thin ThreadPoolExecutor wrapper for I/O-bound work. Used to fan out utils.get_uniprot_seqs across the input ID list — looking up N IDs is now bounded by ~N / pool_size UniProt round-trips instead of N. Pool size defaults to 8 and can be overridden via the GGET_MAX_WORKERS environment variable.

Version ≥ 0.30.5 (May 23, 2026):

  • gget opentargets: Rewrote this module to reflect the new Open Targets API structure
    • some output column/key names may differ to reflect the new API structure
    • Removed the --filter_mode argument
  • gget blast: Fixed compatibility with newer pandas versions (≥ 2.0) where pd.read_html() no longer accepts raw HTML strings directly, causing a FileNotFoundError / OSError: Filename too long error when parsing BLAST results
  • gget cosmic: Added overwrite and gzip arguments to internals.

Version ≥ 0.30.3 (Feb 26, 2026):

  • gget virus: New filtering options, quiet mode, and improved download reliability
    • Added --segment filter for segmented viruses (e.g., Influenza A segments like 'HA', 'NA', 'PB1')
    • Added --vaccine_strain filter to include or exclude vaccine strain sequences
    • Added --source_database filter to select sequences from 'genbank' or 'refseq' (replaces refseqOnly)
    • Added -q / --quiet flag to suppress progress information
    • Extended fallback strategies for improved download reliability on large datasets
    • Command summary file now includes software version

Version ≥ 0.30.2 (Feb 08, 2026):

  • gget virus: Metadata streaming optimization, improved protein filtering, and enhanced error handling and retry logic
    • Metadata now streams to disk during fetch to prevent memory exhaustion on large datasets (100,000+ records)
    • Fixed metadata CSV mapping (camelCase → snake_case) for organism name, host, and collection date
    • Enhanced protein filtering for segmented viruses with improved FASTA header parsing
    • Added annotated=False option for filtering unannotated sequences
    • Added progress bars to batched sequence downloads
    • Fixed collection date naming bug
    • Improved error messages for invalid filter dates
    • Added enhanced retry attempts for virus name resolution
    • Added verbosity to influenza A and COVID-19 checking steps

Version ≥ 0.30.0 (Jan 19, 2026):

  • NEW MODULES:
  • SECURITY IMPROVEMENTS:
    • Replaced os.system() with f-strings containing URLs from external APIs in gget/main.py
    • Replaced exec() with importlib.import_module() in gget setup for safer dynamic imports
    • Replaced shell=True subprocess calls with list-based arguments in gget muscle, gget diamond, and gget setup to prevent command injection

Version ≥ 0.29.3 (Sep 11, 2025):

Version ≥ 0.29.2 (Jul 03, 2025):

  • gget can now be installed using uv pip install gget
    • All package metadata (version, author, description, etc.) is now managed in setup.cfg for full compatibility with modern tools like uv, pip, and PyPI
    • gget now uses a minimal setup.py and is fully PEP 517/518 compatible
  • gget setup will now try to use uv pip install first for speed and modern dependency resolution, and fall back onto pip install if uv fails or is not available
    • Users are informed at each step which installer is being used and if a retry is happening
    • Note: Some scientific dependencies (e.g., cellxgene-census) may not yet support Python 3.12. If you encounter installation errors, try using Python 3.9 or 3.10. (The pip installation might also still succeed in these cases.)
  • All required dependencies are now listed in setup.cfg under install_requires -> Installing gget with pip install . or uv pip install . will automatically install all dependencies

Version ≥ 0.29.1 (Apr 21, 2025):

  • gget mutate:
    • gget mutate has been simplified to focus on taking as input a list of mutations and associated reference genome with corresponding annotation information, and produce as output the sequences with the mutation incorporated and a short region of surrounding context. For the full functionality of the previous version and how it integrates in the context of a novel variant screening pipeline, visit the varseek repository being developed by members of the gget team at https://github.com/pachterlab/varseek.git.
    • Added additional information to returned data frames as described here: https://github.com/pachterlab/gget/pull/169
  • gget cosmic:
    • Major restructuring of the gget cosmic module to adhere to new login requirements set by COSMIC
    • New arguments email and password were added to allow the user to manually enter their login credentials without required input for data download
    • Default changed: gget_mutate=False
    • Deprecated argument: entity
    • Argument mutation_class is now cosmic_project
  • gget bgee:
    • type="orthologs" is now the default, removing the need to specify the type argument when calling orthologs
    • Allow querying multiple genes at once.
  • gget diamond:
    • Now supports translated alignment of nucleotide sequences to amino acid reference sequences using the --translated flag.
  • gget elm:
    • Improved server error handling.

Version ≥ 0.29.0 (Sep 25, 2024):

  • New modules:
  • gget enrichr now also supports species other than human and mouse (fly, yeast, worm, and fish) via modEnrichR
  • gget mutate:
    gget mutate will now merge identical sequences in the final file by default. Mutation creation was vectorized to decrease runtime. Improved flanking sequence check for non-substitution mutations to make sure no wildtype kmer is retained in the mutation-containing sequence. Addition of several new arguments to customize sequence generation and output.
  • gget cosmic:
    Added support for targeted as well as gene screens. The CSV file created for gget mutate now also contains protein mutation info.
  • gget ref:
    Added out file option.
  • gget info and gget seq:
    Switched to Ensembl POST API to increase speed (nothing changes in front end).
  • Other "behind the scenes" changes:

Version ≥ 0.28.6 (Jun 2, 2024):

  • New module: gget mutate
  • gget cosmic: You can now download entire COSMIC databases using the argument download_cosmic argument
  • gget ref: Can now fetch the GRCh37 genome assembly using species='human_grch37'
  • gget search: Adjust access of human data to the structure of Ensembl release 112 (fixes issue 129)

Version ≥ 0.28.5 (May 29, 2024):

  • Yanked due to logging bug in gget.setup("alphafold") + inversion mutations in gget mutate only reverse the string instead of also computing the complementary strand

Version ≥ 0.28.4 (January 31, 2024):

  • gget setup: Fix bug with filepath when running gget.setup("elm") on Windows OS.

Version ≥ 0.28.3 (January 22, 2024):

  • gget search and gget ref now also support fungi 🍄, protists 🌝, and invertebrate metazoa 🐝 🐜 🐌 🐙 (in addition to vertebrates and plants)
  • New module: gget cosmic
  • gget enrichr: Fix duplicate scatter dots in plot when pathway names are duplicated
  • gget elm:
    • Changed ortho results column name 'Ortholog_UniProt_ID' to 'Ortholog_UniProt_Acc' to correctly reflect the column contents, which are UniProt Accessions. 'UniProt ID' was changed to 'UniProt Acc' in the documentation for all gget modules.
    • Changed ortho results column name 'motif_in_query' to 'motif_inside_subject_query_overlap'.
    • Added interaction domain information to results (new columns: "InteractionDomainId", "InteractionDomainDescription", "InteractionDomainName").
    • The regex string for regular expression matches was encapsulated as follows: "(?=(regex))" (instead of directly passing the regex string "regex") to enable capturing all occurrences of a motif when the motif length is variable and there are repeats in the sequence (https://regex101.com/r/HUWLlZ/1).
  • gget setup: Use the out argument to specify a directory the ELM database will be downloaded into. Completes this feature request.
  • gget diamond: The DIAMOND command is now run with --ignore-warnings flag, allowing niche sequences such as amino acid sequences that only contain nucleotide characters and repeated sequences. This is also true for DIAMOND alignments performed within gget elm.
  • gget ref and gget search back-end change: the current Ensembl release is fetched from the new release file on the Ensembl FTP site to avoid errors during uploads of new releases.
  • gget search:
    • FTP link results (--ftp) are saved in txt file format instead of json.
    • Fix URL links to Ensembl gene summary for species with a subspecies name and invertebrates.
  • gget ref:
    • Back-end changes to increase speed
    • New argument: list_iv_species to list all available invertebrate species (can be combined with the release argument to fetch all species available from a specific Ensembl release)

Version ≥ 0.28.2 (November 15, 2023):

  • gget info: Return a logging error message when the NCBI server fails for a reason other than a fetch fail (this is an error on the server side rather than an error with gget)
  • Replace deprecated 'text' argument to find()-type methods whenever used with dependency BeautifulSoup
  • gget elm: Remove false positive and true negative instances from returned results
  • gget elm: Add expand argument

Version ≥ 0.28.0 (November 5, 2023):

Version ≥ 0.27.9 (August 7, 2023):

  • gget enrichr: Use new argument background_list to provide a list of background genes
  • gget search now also searches Ensembl synonyms (in addition to gene descriptions and names) to return more comprehensive search results (thanks to Samuel Klein for the suggestion)

Version ≥ 0.27.8 (July 12, 2023):

  • gget search: Specify the Ensembl release from which information is fetched with new argument -r --release
  • Fixed bug in gget pdb (this bug was introduced in version 0.27.5)

Version ≥ 0.27.7 (May 15, 2023):

Version ≥ 0.27.6 (May 1, 2023) (YANKED due to problems with dependencies -> replaced with version 0.27.7):

Version ≥ 0.27.5 (April 6, 2023):

  • Updated gget search to function correctly with new Pandas version 2.0.0 (released on April 3rd, 2023) as well as older versions of Pandas
  • Updated gget info with new flags uniprot and ncbi which allow turning off results from these databases independently to save runtime (note: flag ensembl_only was deprecated)
  • All gget modules now feature a -q / --quiet (Python: verbose=False) flag to turn off progress information

Version ≥ 0.27.4 (March 19, 2023):

Version ≥ 0.27.3 (March 11, 2023):

  • gget info excludes PDB IDs by default to increase speed (PDB results can be included using flag --pdb / pdb=True).

Version ≥ 0.27.2 (January 1, 2023):

Version ≥ 0.27.0 (December 10, 2022):

  • Updated gget alphafold to match recent changes by DeepMind
  • Updated version number to match gget's creator's age following a long-standing Pachter lab tradition

Version ≥ 0.3.13 (November 11, 2022):

Version ≥ 0.3.12 (November 10, 2022):

  • gget info now also returns subcellular localisation data from UniProt
  • New gget info flag ensembl_only returns only Ensembl results
  • Reduced runtime for gget info and gget seq

Version ≥ 0.3.11 (September 7, 2022):

Version ≥ 0.3.10 (September 2, 2022):

Version ≥ 0.3.9 (August 25, 2022):

Version ≥ 0.3.8 (August 12, 2022):

  • Fixed mysql-connector-python version requirements

Version ≥ 0.3.7 (August 9, 2022):

  • NOTE: The Ensembl FTP site changed its structure on August 8, 2022. Please upgrade to gget version ≥ 0.3.7 if you use gget ref

Version ≥ 0.3.5 (August 6, 2022):

Version ≥ 0.2.6 (July 7, 2022):

  • gget ref now supports plant genomes! 🌱

Version ≥ 0.2.5 (June 30, 2022):

  • NOTE: UniProt changed the structure of their API on June 28, 2022. Please upgrade to gget version ≥ 0.2.5 if you use any of the modules querying data from UniProt (gget info and gget seq).

Version ≥ 0.2.3: (June 26, 2022):

  • JSON is now the default output format for the command-line interface for modules that previously returned data frame (CSV) format by default (the output can be converted to data frame/CSV using flag [-csv][--csv]). Data frame/CSV remains the default output for Jupyter Lab / Google Colab (and can be converted to JSON with json=True).
  • For all modules, the first required argument was converted to a positional argument and should not be named anymore in the command-line, e.g. gget ref -s humangget ref human.
  • gget info: [--expand] is deprecated. The module will now always return all of the available information.
  • Slight changes to the output returned by gget info, including the return of versioned Ensembl IDs.
  • gget info and gget seq now support 🪱 WormBase and 🪰 FlyBase IDs.
  • gget archs4 and gget enrichr now also take Ensembl IDs as input with added flag [-e][--ensembl] (ensembl=True in Jupyter Lab / Google Colab).
  • gget seq argument seqtype was replaced by flag [-t][--translate] (translate=True/False in Jupyter Lab / Google Colab) which will return either nucleotide (False) or amino acid (True) sequences.
  • gget search argument seqtype was renamed to id_type for clarity (still taking the same arguments 'gene' or 'transcript').