[Bioperl-l] Problem writing sequence features for GenBank/GenPept sequences (file: Bio/SeqIO/genbank.pm)

Hilmar Lapp hlapp at gmx.net
Sat Apr 16 15:23:59 EDT 2005


This is a known problem in the 1.5.0 release. Do one of the following:

	- downgrade to 1.4.0 (preferably using the 1.4 branch CVS head, but 
depending on what you want to do the stock 1.4.0 may do fine), or

	- upgrade to the main trunk CVS head.

Hth,

	-hilmar
	
On Saturday, April 16, 2005, at 10:12  AM, sgoegel at gmail.com wrote:

> write_sequence does not write genbank sequences correctly.
> Specifically, features appear as shown below.
>
>  I traced the problem write_seq in SeqIO::genbank.pm,
> and then to the $self->_print_GenBank_FTHelper($fth); line within the 
> foreach
> my $fth ( @fth ) loop.
>
> Here's what I'm actually doing within my script:
> use Bio::Perl;
> use Bio::DB::GenBank;
> use Bio::DB::GenPept;
> use Bio::Seq;
> use Bio::SeqIO;
> # --- cut ---
> my $seq = $gp->get_Seq_by_gi($id); # where $gp is a Bio::DB::GenPept 
> object
> and $id is a valid gi num
> # ...
> write_sequence(">$filename", "genbank", $seq);
>
> this results in a mostly-correct looking genpept output file, but the 
> feature
> section looks like so:
> FEATURES             Location/Qualifiers
>      source          1..194
>                      
> /db_xref="Bio::Annotation::SimpleValue=HASH(0x87eecbc)"
>                      
> /organism="Bio::Annotation::SimpleValue=HASH(0x87eee00)"
>      gene            1..194
>
>  /locus_tag="Bio::Annotation::SimpleValue=HASH(0x87eefe0)"
>  /gene="Bio::Annotation::SimpleValue=HASH(0x87f0728)" Protein         
> 1..194
>
>  /locus_tag="Bio::Annotation::SimpleValue=HASH(0x87f07ac)"
>  /gene="Bio::Annotation::SimpleValue=HASH(0x87f0908)"
>  /product="Bio::Annotation::SimpleValue=HASH(0x87f0914)" Region
>  15..42
>
>  /locus_tag="Bio::Annotation::SimpleValue=HASH(0x87f0a04)"
>  /region_name="Bio::Annotation::SimpleValue=HASH(0x87f1c4c) "
>                      
> /evidence=Bio::Annotation::SimpleValue=HASH(0x87f1cb8)
>                      
> /gene="Bio::Annotation::SimpleValue=HASH(0x87f1d00)"
>                      
> /note="Bio::Annotation::SimpleValue=HASH(0x87f1d48)"
>      Region          50..99
>
>  /locus_tag="Bio::Annotation::SimpleValue=HASH(0x87f1dd8)"
>  /region_name="Bio::Annotation::SimpleValue=HASH(0x87f1ed4) "
>                      
> /evidence=Bio::Annotation::SimpleValue=HASH(0x87f1f40)
>                      
> /gene="Bio::Annotation::SimpleValue=HASH(0x87f2e2c)"
>                      
> /note="Bio::Annotation::SimpleValue=HASH(0x87f2e74)"
>      Site            118
>
>  /locus_tag="Bio::Annotation::SimpleValue=HASH(0x87f2f04)"
>  /evidence=Bio::Annotation::SimpleValue=HASH(0x87f3000)
>  /gene="Bio::Annotation::SimpleValue=HASH(0x87f306c)"
>  /site_type="Bio::Annotation::SimpleValue=HASH(0x87f30fc)"
>  /note="Bio::Annotation::SimpleValue=HASH(0x87f30b4)"
>
>
> !!
>
> As I said, I traced this problem to write_seq in the 
> Bio/SeqIO/genbank.pm
> file, which calls $self->_print_GenBank_FTHelper($fth);
>
> This happened for all of about 4000 nucleotide sequences I downloaded 
> in
> genbank format, as well as a protein sequence I tested.
>
> I fixed the problem with the following modification to
>  _print_GenBank_FTHelper: (modified lines marked with ### comments)
>
> sub _print_GenBank_FTHelper {
>    my ($self,$fth) = @_;
>
>    if( ! ref $fth || ! $fth->isa('Bio::SeqIO::FTHelper') ) {
>        $fth->warn("$fth is not a FTHelper class. Attempting to print, 
> but
>  there could be tears!");
>    }
>    if( defined $fth->key &&
>        $fth->key eq 'CONTIG' ) {
>        $self->_show_dna(0);
>        $self->_write_line_GenBank_regex(sprintf("%-12s",$fth->key),
> 					' 'x12,$fth->loc,"\,\|\$",80);
>    } else {
>        $self->_write_line_GenBank_regex(sprintf("     
> %-16s",$fth->key),
> 					" "x21,
> 					$fth->loc,"\,\|\$",80);
>    }
>
>    foreach my $tag ( keys %{$fth->field} ) {
>        foreach my $value ( @{$fth->field->{$tag}} ) {
>        ####### I added the next 3 lines.
> 	   if (ref ($value) =~ /Bio::Annotation::SimpleValue/) {
> 	       $value = $value->value;
> 	   }
>
> ## --- cut ---
>
> I don't know whether my fix is sufficient or more should be changed to
>  maintain coherence... however, my fix seems to work.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------




More information about the Bioperl-l mailing list