[Biopython] Understanding pdb biopython

Tue Oct 28 19:36:53 UTC 2014

Hi Spencer,That helps.
I shall give a try using seqres iterator. 

     On Tuesday, October 28, 2014 5:49 AM, Spencer Bliven <sbliven at ucsd.edu> wrote:

 The sequence of the protein construct used for the structure (which may or may not match the uniprot sequence) is stored in the SEQRES records of the PDB file. You should be able to parse them using a PdbSeqresIterator.

Hopefully that helps.
-Spencer

On Sun, Oct 26, 2014 at 9:44 PM, João Rodrigues <anaryin at gmail.com> wrote:

Hi Sanjeev,
Check breaks. As I told you, iterate over the amino acids and for each consecutive pair (e.g. residue 1 and 2), check the distance between the "C" atom of 1 and the "N" atoms of 2. This is a very well defined distance (peptide bond). Alternatively, and more simply, check CA-CA distances (e.g. >4Å usually means gap).
Sometimes there is no chain identifier attributed to a particular chain..  check those PDBs for the column 22 of ATOM records.
Cheers,
João

2014-10-26 11:31 GMT-05:00 Sanjeev Sariya <s.sariya_work at ymail.com>:

Hi Joao,Thank you for response.If all residues aren't resolved in crystal, then extracting sequence from pdb, wouldn't be a good call.
 I will be working a lot [~100s or 1000s] in near future. Is there any way, I can find break in my pdb file?

- Another doubt, I've, while printing the chain.ids in script. Many times, I get  chain " ", that is a space. In script sent, code looks like:
        st=PDBParser(QUIET=True).get_structure('X',i)
        ko=st.get_chains()
        for i in ko:
            print i.id 
Why space name is present? 

Thanks.

     On Saturday, October 25, 2014 12:32 AM, João Rodrigues <anaryin at gmail.com> wrote:

 Hi there,
The numbering in your PDB file is not continuous and it matches to regions in the structure that are missing residues. Open your PDB structure in Pymol and you'll see. Alternatively, print the C-N distances (peptide bond) for consecutive residues and you'll also notice when they are larger than ~3Å it corresponds to your break. 

As for your discrepancy between the sequences in the FASTA file and the PDB, that's just because not all residues are resolved in the crystal structure.
Cheers,
João
2014-10-24 13:10 GMT-05:00 Sanjeev Sariya <s.sariya_work at ymail.com>:

Hi All,I'm having a hard time using and understanding biopython pdb../read_pdb_file.py 3OE6.pdb
I'm attaching python script, pdb file, fasta file and output with mail.I'have following doubts:- When I print the sequence I get in broken pieces. Why?- Also the sequence printed doesn't match with the fasta file (attached).- Am I doing making a silly mistake?
I am running script as:
python read_pdb_file.py 3OE6.pdb 
Kindly help and guide.

_______________________________________________
Biopython mailing list  -  Biopython at mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biopython

_______________________________________________
Biopython mailing list  -  Biopython at mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biopython

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20141028/e643ca3e/attachment.html>