[Bioperl-l] PDB sequence from ATOM records

Jurgen Pletinckx jurgen.pletinckx at algonomics.com
Mon Jul 5 09:40:36 EDT 2004


I have got an implementation ready as well, and hope to fold it
into the codebase soon.

I would just like to point out that this 

# Gaps in the crystal structure: You'll have to pad with 'X's where the 
# ATOM records are missing. E.g. if you have residues 
# GLY40,GLU41,CYS44,LYS45. You need to make the sequence GEXXCK.

is, imo, incorrect - the authors may well be using a historical 
numbering scheme for their protein of interest, and 'missing' residue
numbers can correspond to a shorter loop in the new structure when
compared to the 'reference' numbering. In other words - don't insert
'missing' residues for the user, as you can be wrong.

# The only other comment I would make is that I think the atom_seq method 
# should be attached to the Chain object, not the Entry object. And so 
# called by $chain->atom_seq not $entry->atom_seq('A'). The sequence is a 
# property of a chain so this makes the most sense to me personally.

Concur. Except, of course, I can't write it that way: a chain doesn't 
know what residues are part of it, or what structure it is in. So I've
stuck the method on Entry level, together with all other methods.

Patch to Entry.pm, Residue.pm and IO/pdb.pm is attached - enjoy! (And,
of course, review, comment and criticize)

-- 
Jurgen Pletinckx
AlgoNomics NV 

-------------- next part --------------
> diff  /usr/local/lib/perl5/site_perl/5.6.1/Bio/Structure/IO/pdb.pm bioperl-dev/Bio/Structure/IO/
1206a1207,1208
>                               $residue->type($resname);
>                               $residue->number($resseq);

> diff /usr/local/lib/perl5/site_perl/5.6.1/Bio/Structure/Residue.pm  bioperl-dev/Bio/Structure/
157a158,193
> =head2 number()
> 
>  Title   : number
>  Usage   : $residue->number(212)
>  Function: Gets/sets the residue number for this residue
>  Returns : the residue number
>  Args    : the residue number
> 
> =cut
> 
> sub number {
>         my ($self, $value) = @_;;
>         if (defined $value) {
>               $self->{'number'} = $value;
>         }
>         return $self->{'number'};
> }
> 
> =head2 type()
> 
>  Title   : type
>  Usage   : $residue->type("TRP")
>  Function: Gets/sets the residue type for this residue
>  Returns : the residue type
>  Args    : the residue type
> 
> =cut
> 
> sub type {
>         my ($self, $value) = @_;;
>         if (defined $value) {
>               $self->{'type'} = $value;
>         }
>         return $self->{'type'};
> }
> 


> diff /usr/local/lib/perl5/site_perl/5.6.1/Bio/Structure/Entry.pm bioperl-dev/Bio/Structure/
715a716,763
> =head2 sequence()
> 
>  Title   : sequence
>  Usage   : $seqobj = $structure->sequence("A");
>  Function: gets a sequence object containing the sequence as present in the coord-
>            inate section.
>           if a chain-ID is given , the sequence for this chain is given, if none
>           is provided the first chain is chosen
>  Returns : a Bio::PrimarySeq
>  Args    : the chain-ID of the chain you want the sequence from
> 
> =cut
> 
> sub sequence {
>       my ($self, $chainid) = @_;
>       my $chain;
>       if ( !defined $chainid) {
>               my $m = ($self->get_models($self))[0];
>               $chain = ($self->get_chains($m))[0];
>               $chainid = $chain->id;
>       }
>       else
>       {
>               my $m = ($self->get_models($self))[0];
>               for my $ch ($self->get_chains($m))
>               {
>                       next unless $ch->id eq $chainid;
>                       $chain = $ch;
>                       last;
>               }
>               $self->throw("'$chainid' is not a valid chain id for structure ".$self->id) unless $chain;
>       }
> 
>       my $seq_string;
>       for my $res ( $self->get_residues($chain))
>         {
>               $seq_string .= ucfirst(lc($res->type));
>       }
> 
> $self->debug("sequence : $seq_string\n");
>       # this will break for non-protein structures (about 10% for now) XXX KB
>       my $pseq = Bio::PrimarySeq->new(-alphabet => 'protein');
>       $pseq = Bio::SeqUtils->seq3in($pseq,$seq_string);
>       my $id = $self->id . "_" . $chainid;
>       $pseq->id($id);
>       return $pseq;
> }
> 


More information about the Bioperl-l mailing list