[BioPython] Checked in Parsers for Phred and Ace files

Mon Feb 2 22:16:49 EST 2004

Hi Brad,

> Hi Frank;
> 
> > > The one caveat is that the Ace parser does not support RT, CT and WA 
> > > tags.
> 
> > I couldn't stand a parser that's not completely doing its job,
> > especially when it's written by myself :-). So I added supprt for these
> > three tags. 
> 
> Great -- thanks. It's good to have you looking after, uh, yourself.
> Someone's got to do it :-)
> 
> > It seems to work fine - however, before I submit it, I'd
> > like to be sure that it complies to the specification of the ACE format.
> > The consed documentation is not really exact about many things - so does
> > anybody know where to find a more comprehensive descripion of the ace
> > format? Is there any?
> 
> I don't know anything specific, but there is a section in the consed
> docs (ACE FILE FORMAT): 
> 
> http://www.phrap.org/consed/distributions/README.13.0.txt
>

Yep, that's what I used for these tags. I'd just thought there might be
something available like an exact syntax diagram for ace format, to clarify some
minor uncertainties.   

> There is a brief description of those three tags there and what each
> element of the tags stand for. Any more beyond that I personally
> have no idea -- I guess if someone has experience working with and
> adding these tags they might be able to chime in about how they are
> used. 
> 
> But independent of that, parsing them into lists of lists or some
> other "basic" data structure seems to be a decent enough way to
> handle them.
> 

Right. That's kind of what I did. One caveat is that these tags are (usually?
always? necessarily?) at the end of the file, though ct and rt refer to specific
reads or contigs. Means that the RecordParser would have to check the end of the
file for each record to find out whether there is one of these nasty tags and
then return it with its corresponding contig record. This is not the way the
standard record parser works, although it probably could be bend and twisted to
accomplish this. 
Anyway, the easiest workaround is to use the full ACEParser which reads the
whole file at once - if ct,rt and wa are of interest. Then these tags are
implemented as lists in the main data structure, thus the data structure
reflects the file structure.  That's how things are done now. Anyway, no one
uses these tags anyway :-) 

I'll send you the updated parser soon. A nexus parser is next on the todo list...

Cheers,
Frank 

> Hope this helps -- thanks again for all the work!

We have to thank you guys - don't know what to do without Biopthon!!

Frank

> Brad
> _______________________________________________
> BioPython mailing list  -  BioPython at biopython.org
> http://biopython.org/mailman/listinfo/biopython
>