[Bioperl-l] Bio::Structure bug fix for seqres method

Alex Gutteridge alexg@ebi.ac.uk
Wed, 29 May 2002 16:42:30 +0100


Hi,

I've found a bug in the seqres method found in Bio::Structure::Entry 
which fails for some pdbs (1akm and 1dob, but probably others). The 
subroutine is as follows in BioPerl 1.0:

sub seqres {
    my ($self, $chainid) = @_;
    my $s_u = "x4 A1 x7 A3 x1 A3 x1 A3 x1 A3 x1 A3 x1 A3 x1 A3 x1 A3 x1 
A3 x1 A3 x1 A3 x1 A3 x1 A3";
    my $seq;
    if ( !defined $chainid) {
        my $m = ($self->get_models($self))[0];
        my $c = ($self->get_chains($m))[0];
        $chainid = $c->id;
    }
    my $seqres = ($self->annotation->get_Annotations("seqres"))[0];
    my $seqres_string = $seqres->as_text;
$self->debug("seqres : $seqres_string\n");
    $seqres_string =~ s/^Value: //;
    $seqres_string =~ s/\d+//g;        # no numbers needed
    $seqres_string =~ s/ \s //g;        # single character is Chain 
identifier
    $seqres_string =~ s/(\w+)/\u\L$1/g;    # ALA -> Ala  (for SeqUtils)
    $seqres_string =~ s/\s//g;         # strip all spaces
$self->debug("seqres : $seqres_string\n");

    # this will break for non-protein structures (about 10% for now) XXX KB
    my $pseq = Bio::PrimarySeq->new(-alphabet => 'protein');
    $pseq = Bio::SeqUtils->seq3in($pseq,$seqres_string);
    my $id = $self->id . "_" . $chainid;
    $pseq->id($id);
    return $pseq;
}

The lines which need changing are in the series of substitutions done on 
$seqres_string.

    $seqres_string =~ s/\d+//g;        # no numbers needed
    $seqres_string =~ s/ \s //g;        # single character is Chain 
identifier

should become

    $seqres_string =~ s/\d+/ /g;        # no numbers needed
    $seqres_string =~ s/ \S //g;        # single character is Chain 
identifier

This fixes the problem in 1akm and 1dob, but I've not tested it on any 
others so far.


Alex Gutteridge