[Biopython-dev] [Bug 3000] Could SeqIO.parse() store the whole, unparsed multiline entry?

Peter biopython at maubp.freeserve.co.uk
Sat Mar 13 13:43:53 UTC 2010


On Fri, Mar 12, 2010 at 8:29 PM, Martin MOKREJŠ
<mmokrejs at fold.natur.cuni.cz> wrote:
>
> I finally got back to this. Thank your for all your work.
> I would be glad if one could use the accession without
> the trailing ".1", etc for get_raw() and get(). I think
> just any version of the record should be returned,
> and maybe a list if there were multiple versions of
> the same.

This is just a quick reply to answer this part of your email.

It would be unwise to try and be clever with the key
matching - in this case yes, for GenBank files we know
what the names means, accession.version - but this is
not true in general.

In this case the answer for your needs would be to use
the Bio.SeqIO.index optional argument to specify the
keys. e.g. something like this:

from Bio import SeqIO

def strip_version(identifier):
   return identifier.rsplit(".",1)[0]

my_dict = SeqIO.index(filename, "gb", key_function=strip_version)

That way all the keys will have just the accession
without the version (assuming there are no clashes
which I think will raise an error).

Peter




More information about the Biopython-dev mailing list