[Bioperl-l] Problems parsing swiss-prot files

Jason Stajich jason at cgt.duhs.duke.edu
Fri Jul 2 20:49:02 EDT 2004


I've fixed it in CVS.  I also fixed a bunch of other things in swissprot
parsing to make the parser cleaner I hope.   This involved improving the
'new' function in Bio::Annotation::Reference so you'd want to get that as
well if you getting code from CVS.

Multi-line RP lines are now all put into the rp field of the
Annotation::Reference object.  The parser takes care of splitting it back
into multi-line fields upon writing (although I didn't test this case
specifically).

PVH and our code auditors.  As happy as I am about the code audit for
SeqIO and the like and making sure that things can roundtrip.  I really
feel like the guts of these parsers could just a few weeks of someone's
time to clean them up first.  Of course myself and few others would want
to simplify the sequence/annotation/feature object model first so who
knows what is the best starting point...

-jason

On Fri, 2 Jul 2004, Jessica Dantzer wrote:

> Most of the references in most of the files have only one RP
> line.  Occasionally, there are two.  I haven't seen more than two,
> though.  One of the files that had more than one line in at least one
> reference was for P33897.  I'm parsing information on the mutation/ variant
> data and their references, and so need some of the information on those
> second lines.
>
> At 03:55 PM 7/2/2004, Jason Stajich wrote:
> >Is there more than one RP line per reference?  The data structures and
> >parsers currently assume there is only one.
> >can you send an acc so we can add it to the tests?
> >
> >-jason
> >On Thu, 1 Jul 2004, Jessica Dantzer wrote:
> >
> > > I'm working on parsing swiss-prot files for use in another database, and
> > > I've managed to work out where all the information I need is stored for
> > > the most part.  The only problems I'm encountering are with the reference
> > > parsing-- Some of the files have multiple "RP" lines, and I only seem to
> > > be able to get one.  The code seems to indicate that this is how the files
> > > are parsed.  Is there any other way to access the second line?
> > >
> > > Thanks,
> > > Jessica
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at portal.open-bio.org
> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > >
> >
> >--
> >Jason Stajich
> >Duke University
> >jason at cgt.mc.duke.edu
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list