[Bioperl-l] Bio::SeqIO::genbank seems to munge SOURCE and ORGANISM lines

Geoff Purdy geoff_purdy at yahoo.com
Thu Feb 20 12:53:31 EST 2003


I've noticed some more strange behavior with reading
and printing GenBank files using Bio::SeqIO.  

It appears that the SOURCE and ORGANISM lines are
being corrupted by BioPerl in some records.  Below is
an example using GenBank accession AE016800 (for ease
of reading, I've removed the uneffected surounding
lines):  

>From the original genbank file:
SOURCE      Vibrio vulnificus CMCP6
  ORGANISM  Vibrio vulnificus CMCP6


>From the output after reading in an printing out with
Bio::SeqIO:
SOURCE      Vibrio vulnificus CMCP6 CMCP6.
  ORGANISM  Vibrio vulnificus

You can see that CMCP6 was dropped from the ORGANISM
and appended (along with a mystery period) to the
SOURCE.  

Is this a known issue, or should I submit it to
bugzilla?

Here is the code snippet that I used for reading the
file and writing it back out:

my $in  = Bio::SeqIO->newFh(-file => "AE016800.gb" , 
                            '-format' => 'Genbank');  
 
my $out = Bio::SeqIO->newFh('-format' => 'Genbank');  
    
print $out $_ while <$in>;


__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - forms, calculators, tips, more
http://taxes.yahoo.com/


More information about the Bioperl-l mailing list