[Biopython] Using Tabix on a bgzf file

Peter Cock p.j.a.cock at googlemail.com
Sun Feb 16 14:32:58 UTC 2014


On Sunday, February 16, 2014, Vishnu Chilakamarri <vishnuc11j93 at gmail.com>
wrote:

> Hi Peter,
>
> I read your code on bgzf compression and the blog post. I used
> uniprot_sprot_varsplic.fasta.gz<
> ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot_varsplic.fasta.gz
> >as
> the example (from the EBI ftp) to compress in bgzf and then index
> using
> Tabix. Now the file I've gotten has a .tbi extension. I'm trying to parse
> the file but gives a preset not provided error and when I'm trying to
> access columns I'm getting indexes overlap error. Can you tell me where
> I've gone wrong?
>
> Thank you,
> Vishnu
>
>
Biopython doesn't (currently) use the tabix index (*.tbi) file.

Biopython's Bio.SeqIO indexing code uses the BGFZ compressed
sequence file directly. Using the SeqIO.index(...) function will make
an in memory index, using SeqIO.index_db(...) will make an index
in disk using SQLite. This system is quite separate from tabix
(and Biopython uses it for many many sequence files formats,
not just FASTA).

Peter



More information about the Biopython mailing list