[Biopython-dev] [Bug 2425] Fasta ID parsing error

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Fri Sep 26 12:44:16 UTC 2008


http://bugzilla.open-bio.org/show_bug.cgi?id=2425





------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk  2008-09-26 08:44 EST -------
(In reply to comment #1)
> I assume in your example you expected "region1.fasta.screen.Contig1" to be
> used as the record key in BioSQL?  There is a 40 character limit on this
> field, which should be fine for most FASTA identifiers.

In BioSQL v1.0.1, fields bioentry.accession and dbxref.accession were increased
from 40 to  128 characters.  See
http://lists.open-bio.org/pipermail/biosql-l/2008-August/001311.html

However, bioentry.name is still only 40 characters.

It looks like for a FASTA file like this:

>gi|9629357|ref|NC_001802.1| Human immunodeficiency virus type 1, complete genome
GGTCTCTCTGGTTAGACCAGATCTGAGCCTGGGAGCTCTCTGGCTAACTAGGGAACCCACTGCTTAAGCC
TCAATAAAGCTTGCCTTGAGTGCTTCAAGTAGTGTGTGCCCGTCTGTTGTGTGACTCTGGTAACTAGAGA
...

BioPerl will use "gi|9629357|ref|NC_001802.1|" as bioentry.name and
bioentry.identifier with "Human immunodeficiency virus type 1, complete genome"
as bioentry.description, 0 as the version (BioSQL convention when unknown),
with bioentry.taxon_id and bioentry.division as NULL.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list