[Biojava-dev] EmblFileFormer
Matthew Pocock
matthew_pocock at yahoo.co.uk
Tue Sep 2 12:22:57 EDT 2003
Hi Lorna,
Yes - the fault goes back to the embl parser, not the
writer. The parser should be keeping track of RN RP,
etc. lines, and whenever a complete set goes through,
it should be spitting out a single annotation event
(perhaps called REFERENCE?) with all the data from a
single block in it. This would then be sensibly put
into a list, with one element for each reference
block.
The file former would then need to be modified to
unpack the REFERENCE list, but this would not be a big
deal.
If you are keen to do this, then we can talk you
through it, either here or on chat (irc.freenode.net,
#biojava).
Matthew
--- Lorna Morris <lmorris at ebi.ac.uk> wrote:
> Hi
>
> I'm using biojava to parse an EMBLFlatFile, add
> extra annotation, and
> dump the new file out at the end. The parser seems
> to be really useful
> for this. However the file created using
> SeqIOTools.writeEmbl contains
> errors, the lines RN, RP, RX, RA, RT, RL aren't
> correctly nested, these
> lines should occur in repeated sets, but the ouput
> has all the RN lines,
> followed by all the RP lines etc, so they are merged
> rather than nested.
>
> I've looked at the javadoc for the class
> EmblFileFormer and there is a
> comment that might relate to this problem:
>
> * <p><code>EmblFileFormer</code> performs the
> detailed formatting of
> * EMBL entries for writing to a
> <code>PrintStream</code>. Currently
> * the formatting of the header is not correct. This
> really needs to
> * be addressed in the parser which is merging
> fields which should
> * remain separate.</p>
>
> I've tried to address the problem by modifying the
> class,
> SeqIOEventEmitter, but have run into difficulties,
> because I cannot
> untangle which RN, RP, RX, RA, RT, RL 'belong'
> together in a single
> block, as the annotation is just in an ArrayList.
> Maybe I should take
> note of the javadoc comment above and address the
> problem in the parser.
> Is so could you give me some pointers on which
> classes I should focus
> on, in order to fix this, and whether you think it
> will be a difficult
> problem to solve.
>
> Hope this makes sense.
>
> Many thanks,
>
> Lorna
>
>
>
-------------------------------------------------------------------
> Lorna Morris
> EMBL-European Bioinformatics Institute
> Tel: +44-(0)1223-492507
> Wellcome Trust Genome Campus, Hinxton Fax:
> +44-(0)1223-494468
> Cambridge
> CB10 1SD, UK
>
> email:lmorris at ebi.ac.uk
>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at biojava.org
> http://biojava.org/mailman/listinfo/biojava-dev
________________________________________________________________________
Want to chat instantly with your online friends? Get the FREE Yahoo!
Messenger http://uk.messenger.yahoo.com/
More information about the biojava-dev
mailing list