[Bioperl-l] problem parsing pdb
Mark A. Jensen
maj at fortinbras.us
Fri Sep 18 15:55:47 UTC 2009
Hi Paola--
My researches reveal that this is a "standard kludge" in pdb format. A letter
following a residue number is called an "insertion code" or "icode", and my
understanding is that is does allow for the insertion of residues without
upsetting the rest of the coordinates. (This is a feature, and not laziness,
since people very quickly begin to refer to amino acid coordinates based on a
reference sequence in interesting region, and you can't easily say to the
community, "hey, that's 22 now, not 20...")
Since it's standard, you should expect it. Bio::Structure handles the icode by
creating the residue id as follows:
#my $res_name_num = $resname."-".$resseq;
my $res_name_num = $resname."-".$resseq;
$res_name_num .= '.'.$icode if $icode;
so you can get back the reside 3-letter name, its numerical position, and
insertion code by doing
my ($name, $number, $icode) = $res->id =~ /(.*?)-([0-9]+)\.?([A-Z]?)/;
In this case, if the icode is not present, then $icode eq '' (not undef).
Hope this helps-
Mark
----- Original Message -----
From: "Paola Bisignano" <paola_bisignano at yahoo.it>
To: <bioperl-l at bioperl.org>
Sent: Tuesday, September 08, 2009 4:55 AM
Subject: [Bioperl-l] problem parsing pdb
Hi,
I'm in a little troble because i need to exactly parse pdb file, to extract
chain id and res id, but I finded that in some pdb the number of residue is
followed by a letter because is probably a residue added by crystallographers
and they didm't want to change the number of residue in sequence....for example
the pdb 1PXX.pdb I parsed it with my script below, I didn't find any useful
suggestion about this in bioperltutorial or documentation of bioperl online
#!/usr/local/bin/perl
use strict;
use warnings;
use Bio::Structure::IO;
use LWP::Simple;
my $urlpdb=
"http://www.rcsb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId=1PXX";
my $content = get($urlpdb);
my $pdb_file = qq{1pxx.pdb};
open my $f, ">$pdb_file" or die $!;
binmode $f;
print $f $content;
print qq{$pdb_file\n};
close $f;
my $structio=Bio::Structure::IO->new (-file=>$pdb_file);
my $struc=$structio->next_structure;
for my $chain ($struc->get_chains)
{
my $chainid = $chain->id ;
for my $res ($struc->get_residues($chain))
{
my $resid=$res-> id;
my $atoms= $struc->get_atoms($res);
open my $f, ">> 1pxx.parsed";
print $f "$chainid\t$resid\n";
close $f;
}
}
but it gives my file with an error in ILE 105A ILE 2105C because they have a
letter that follow the number of resid.... can I solve that problem without
writing intermediate files?
because i need to have the reside id as 105A not 105.A
so
A ILE-105A
without point between number and letter....
Thank you all,
Paola
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list