[BioPython] Uniprot Parser
Ruchira Datta
ruchira.datta at gmail.com
Sun Feb 24 17:36:56 UTC 2008
I just found another bug, which would be a bit trickier to fix properly.
This code:
def database_cross_reference(self, line):
# From CLD1_HUMAN, Release 39:
# DR EMBL; [snip]; -. [EMBL / GenBank / DDBJ] [CoDingSequence]
# DR PRODOM [Domain structure / List of seq. sharing at least 1
domai
# DR SWISS-2DPAGE; GET REGION ON 2D PAGE.
line = line[5:]
# Remove the comments at the end of the line
i = line.find('[')
if i >= 0:
line = line[:i]
cols = line.rstrip(_CHOMP).split(';')
cols = [col.lstrip() for col in cols]
self.data.cross_references.append(tuple(cols))
applied to this line of the TrEMBL record for A2RB21_ASPNG:
DR GO; GO:0016277; F:[myelin basic protein]-arginine N-methyltra...;
IEA:EC.
got me this tuple:
('GO', 'GO:0016277', 'F:')
The bracketed term was interpreted as a comment and the whole line was
stripped.
Thanks,
--Ruchira
On Sun, Feb 24, 2008 at 8:47 AM, Peter <biopython at maubp.freeserve.co.uk>
wrote:
> On Sun, Feb 24, 2008 at 4:28 PM, Ruchira Datta <ruchira.datta at gmail.com>
> wrote:
> >
> > Hi Peter,
> >
> > I had tried SeqRecord first, but it didn't include the references,
> which I
> > absolutely need.
>
> The good news is I think the references are included now (in Biopython
> CVS), see enhancement Bug 2235:
> http://bugzilla.open-bio.org/show_bug.cgi?id=2235
>
> > While inclusion of newlines may be understandable, it's a bug. The
> newline
> > is stripped from several other fields by _RecordConsumer, e.g.,
> > ...
>
> Off the top of my head, I would say that example is a little different
> - reference number lines do not span multiple lines.
>
> > The newlines are never significant in any field.
>
> You are probably right - although perhaps they could be important in
> long text fields where a line break has been inserted mid word and a
> hyphenation added.
>
> The newlines are also important if using the Record object to recreate
> the raw file (e.g. to save to disk). However I doubt anyone is doing
> this. Having a __str__ method defined like there is in the
> Bio.GenBank.Record.Record object which would make this easier.
>
> > In a couple of weeks I might be able to check out the cvs
> > version and provide a patch.
>
> Please do.
>
> Peter
>
More information about the Biopython
mailing list