[Biopython] Bio.SeqIO.index() - gzip support and/or index stored on disk?
Peter
biopython at maubp.freeserve.co.uk
Fri Jun 4 12:59:22 UTC 2010
On Fri, Jun 4, 2010 at 11:53 AM, Kevin <aboulia at gmail.com> wrote:
> I vote for sqlite index. Have been using bsddb to do the same but the db
> is inflated compared to plain text. Performance is not bad using btree
The other major point against bsddb is that future versions of Python
will not include it in the standard library - but Python 2.5+ does have
sqlite3 included.
> For gzip I feel it might be possible to gunzip into a stream which
> biopython can parse on the fly?
Yes of course, like this:
import gzip
from Bio import SeqIO
handle = gzip.open("uniprot_sprot.dat.gz")
for record in SeqIO.parse(handle, "swiss"): print record.id
handle.close()
Parsing is easy - the point of this discussion is random access to
any record within the stream (which requires jumping to an offset).
Peter
More information about the Biopython
mailing list