[Biopython] Proteomics tools for biopython

David Martin d.m.a.martin at dundee.ac.uk
Wed Feb 15 17:22:48 UTC 2012

> Hey David,
> What sort of tools do you have in mind for proteomics?  I have quite a few stashed away (3/6 frame translations, GFF files->proteins, X!Tandem parsers/FDR calculators, GFF parsers, etc.)
> Chris

At present we are wrapping the OpenMS outputs (featureML etc) so that we can interrogate the detail of how the runs behave. It is insightful to see (for example) how many of the ms/ms are on overlapping peptides, and the distribution of ms/ms selections per feature (vs intensity). This is just the first stage. Having these data (which up till now have been difficult to access) allows for building of smarter tools (custom delta mass thresholds for each ms/ms, second peptide searching, seeing whether all the peptide ID for a feature agree, correlating ID from different search engines to the same spectra).

There are outstanding questions from our users for things like 'is it really necessary to do duplicate runs?' or in other words, can we get the machine to treat duplicate runs differently to optimise ID. (under the principle that madness is doing the same thing repeatedly but expecting different results.)

Parsers for XTandem! would be really useful as that is something we'd like to have in our tool chain. A Mascot one would be good - I am looking into that (it is on my list of things to do, just not near the top right now.)  I very much favour a modular approach where each class/object does one thing really well and can feed output to another class, and all can be represented using open formats. It might be a good idea to arrange a telecon or Skype group chat for people who are interested in contributing to this and building a comprehensive set of tools into Biopython. I can't promise too much from our end but we are making good progress and we have a strong commitment to open software and algorithms, with a heavy python development presence.


The University of Dundee is a registered Scottish Charity, No: SC015096

More information about the Biopython mailing list