[BioPython] Uniprot Parser
Ruchira Datta
ruchira.datta at gmail.com
Sun Feb 24 17:53:10 UTC 2008
On Sun, Feb 24, 2008 at 9:48 AM, Peter <biopython at maubp.freeserve.co.uk>
wrote:
> On Sun, Feb 24, 2008 at 5:36 PM, Ruchira Datta <ruchira.datta at gmail.com>
> wrote:
> > I just found another bug, which would be a bit trickier to fix properly.
> >
> > This code:
> >
> > def database_cross_reference(self, line):
> > # From CLD1_HUMAN, Release 39:
> > # DR EMBL; [snip]; -. [EMBL / GenBank / DDBJ] [CoDingSequence]
> > # DR PRODOM [Domain structure / List of seq. sharing at least
> 1
> > domai
> > # DR SWISS-2DPAGE; GET REGION ON 2D PAGE.
> > line = line[5:]
> > # Remove the comments at the end of the line
> > i = line.find('[')
> > if i >= 0:
> > line = line[:i]
> > cols = line.rstrip(_CHOMP).split(';')
> > cols = [col.lstrip() for col in cols]
> > self.data.cross_references.append(tuple(cols))
> >
> > applied to this line of the TrEMBL record for A2RB21_ASPNG:
> >
> > DR GO; GO:0016277; F:[myelin basic protein]-arginine N-methyltra...;
> > IEA:EC.
> >
> > got me this tuple:
> >
> > ('GO', 'GO:0016277', 'F:')
> >
> > The bracketed term was interpreted as a comment and the whole line was
> > stripped.
>
> That does look tricky... especially if we want to preserve backwards
> compatibility. This "F" cross reference looks like the partial text
> for the GO term. I wonder how common this is? (square brackets in the
> cross references themselves). I can't see the use of "F" mentioned
> here: http://www.expasy.org/sprot/userman.html#DR_line
>
> Could you file a bug and add a few more other examples if you find them.
>
> Thanks
>
> Peter
>
Here 'F;' means the annotation refers to the molecular function part of the
Gene
Ontology (as opposed to, e.g., 'P:' for biological process).
I think this is quite rare, but I'll see if any other examples came up.
--Ruchira
More information about the Biopython
mailing list