[BioPython] Uniprot Parser

Peter biopython at maubp.freeserve.co.uk
Sun Feb 24 16:47:01 UTC 2008


On Sun, Feb 24, 2008 at 4:28 PM, Ruchira Datta <ruchira.datta at gmail.com> wrote:
>
> Hi Peter,
>
>  I had tried SeqRecord first, but it didn't include the references, which I
> absolutely need.

The good news is I think the references are included now (in Biopython
CVS), see enhancement Bug 2235:
http://bugzilla.open-bio.org/show_bug.cgi?id=2235

> While inclusion of newlines may be understandable, it's a bug.  The newline
> is stripped from several other fields by _RecordConsumer, e.g.,
> ...

Off the top of my head, I would say that example is a little different
- reference number lines do not span multiple lines.

> The newlines are never significant in any field.

You are probably right - although perhaps they could be important in
long text fields where a line break has been inserted mid word and a
hyphenation added.

The newlines are also important if using the Record object to recreate
the raw file (e.g. to save to disk).  However I doubt anyone is doing
this.  Having a __str__ method defined like there is in the
Bio.GenBank.Record.Record object which would make this easier.

> In a couple of weeks I might be able to check out the cvs
> version and provide a patch.

Please do.

Peter



More information about the Biopython mailing list