[Bioperl-l] GenBank SeqIO

Hilmar Lapp hilmarl@yahoo.com
Sat, 24 Feb 2001 22:58:50 -0800


I fixed bug #160 (had a long life :-) by adding pid() to
Bio::Seq::RichSeqI (and of course also to the implementation
RichSeq.pm) and appropriate parsing and writing code to
SeqIO::genbank.

Checking the code and the entry given in the bug report revealed
additional bugs:
1) Secondary accessions were ignored. Fixed for both parsing and
writing.
2) The version parsing code was wrong. The version in fact is only
the number appended to the accession number. The possibly added GI
number was ignored. I fixed version parsing and writing and
deposit the GI number as primary_id() (which it really is). The GI
number will also be printed if a seq version is set and
primary_id() is a positive integer number.
3) The REFERENCE line was printed with 1 space too much. For
proteins the seq part was given in bases instead of residues,
which Genbank does. Fixed both.

Problem 1) applies to the EMBL parser as well, unless EMBL doesn't
allow multiple accessions. Could someone who is familiar with the
format check that, or give me a link where I can download
pertinent samples (EMBL is nucleotide only, right? Do they have
seq-versioning, too?).

	Hilmar
-- 
-----------------------------------------------------------------
Hilmar Lapp                              email: hlapp@gmx.net
GNF, San Diego, Ca. 92122                phone: +1 858 812 1757
-----------------------------------------------------------------