[Biopython-dev] Biopython 1.60 plans and beyond

João Rodrigues anaryin at gmail.com
Mon Feb 20 14:30:23 UTC 2012


Hi all,

Answering what "concerns" me :)

>
> > If there are disordered regions (very common), the missing residues are
> > replaced with 'X' characters. These residues can be listed in the SEQRES
> > lines of the PDB header, if it's available, but they're not included with
> > the atomic coordinates, so PdbIO can't reliably fill in these disordered
> > residues for all PDB files. This matches the behavior of the tool I was
> > using before (which is non-free and not widely used).
>

The SEQRES contains the sequence used in the construct expressed and
crystallized so it's never incomplete. What I've done in the past in these
situations is iterate over the SEQRES and fill as '-' those residues that
do not have coordinates. I don't know if I have any decent version of my
MODELLER PIR format SeqIO stuff on github, but maybe we could work together
to make it consistent (since what I wanted was PDB to seq essentially) ? Or
maybe these are two different points of view for the same problem and need
different solutions...

https://github.com/JoaoRodrigues/biopython/tree/modeller-pirIO




> Rather than literally copying it, do you think it is realistic to make
> some of Bio.PDB work without NumPy? e.g. fall back on tuples
> of floats (x,y,z) for atom co-ordinates. Just brainstorming - this
> might be a horrible idea?
>

I kind of disagree because otherwise we'd have to convert them to numpy
arrays everytime we need them.

Regarding my own work, I've been slowly working on cleaning a bit Bio.PDB
(for example, all those get_X methods that just return class attributes)
and organising my own GSoC code into it and in Bio.Struct. I don't know
when I have this even "alpha"-testable, it's been a long road and I had a
couple of computer crashes that made me lose my data so.. When would there
be a soft deadline for 1.60?

Best,

João




More information about the Biopython-dev mailing list