[Biopython] Understanding pdb biopython

Spencer Bliven sbliven at ucsd.edu
Tue Oct 28 09:48:55 UTC 2014


The sequence of the protein construct used for the structure (which may or
may not match the uniprot sequence) is stored in the SEQRES records of the
PDB file. You should be able to parse them using a PdbSeqresIterator
<http://biopython.org/DIST/docs/api/Bio.SeqIO.PdbIO-module.html>.

Hopefully that helps.
-Spencer

On Sun, Oct 26, 2014 at 9:44 PM, João Rodrigues <anaryin at gmail.com> wrote:

> Hi Sanjeev,
>
> Check breaks. As I told you, iterate over the amino acids and for each
> consecutive pair (e.g. residue 1 and 2), check the distance between the "C"
> atom of 1 and the "N" atoms of 2. This is a very well defined distance
> (peptide bond). Alternatively, and more simply, check CA-CA distances (e.g.
> >4Å usually means gap).
>
> Sometimes there is no chain identifier attributed to a particular chain..
>  check those PDBs for the column 22 of ATOM records.
>
> Cheers,
>
> João
>
>
>
>
>
> 2014-10-26 11:31 GMT-05:00 Sanjeev Sariya <s.sariya_work at ymail.com>:
>
>
>> Hi Joao,
>> Thank you for response.
>> If all residues aren't resolved in crystal, then extracting sequence from
>> pdb, wouldn't be a good call.
>>
>> I will be working a lot [~100s or 1000s] in near future. Is there any
>> way, I can find break in my pdb file?
>>
>> - Another doubt, I've, while printing the chain.ids in script. Many
>> times, I get  chain " ", that is a space.
>> In script sent, code looks like:
>>
>>         st=PDBParser(QUIET=True).get_structure('X',i)
>>         ko=st.get_chains()
>>         for i in ko:
>>             print i.id
>>
>> Why space name is present?
>>
>> Thanks.
>>
>>   On Saturday, October 25, 2014 12:32 AM, João Rodrigues <
>> anaryin at gmail.com> wrote:
>>
>>
>> Hi there,
>>
>> The numbering in your PDB file is not continuous and it matches to
>> regions in the structure that are missing residues. Open your PDB structure
>> in Pymol and you'll see. Alternatively, print the C-N distances (peptide
>> bond) for consecutive residues and you'll also notice when they are larger
>> than ~3Å it corresponds to your break.
>>
>> As for your discrepancy between the sequences in the FASTA file and the
>> PDB, that's just because not all residues are resolved in the crystal
>> structure.
>>
>> Cheers,
>>
>> João
>>
>> 2014-10-24 13:10 GMT-05:00 Sanjeev Sariya <s.sariya_work at ymail.com>:
>>
>> Hi All,
>> I'm having a hard time using and understanding biopython pdb.
>> ./read_pdb_file.py 3OE6.pdb
>>
>> I'm attaching python script, pdb file, fasta file and output with mail.
>> I'have following doubts:
>> - When I print the sequence I get in broken pieces. Why?
>> - Also the sequence printed doesn't match with the fasta file (attached).
>> - Am I doing making a silly mistake?
>>
>> I am running script as:
>> python read_pdb_file.py 3OE6.pdb
>>
>> Kindly help and guide.
>>
>>
>> _______________________________________________
>> Biopython mailing list  -  Biopython at mailman.open-bio.org
>> http://mailman.open-bio.org/mailman/listinfo/biopython
>>
>>
>>
>>
>>
>
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20141028/2cbbfe63/attachment.html>


More information about the Biopython mailing list