[Bioperl-l] Long /labels are wrapped, but can't be read

Chris Fields cjfields at illinois.edu
Wed Oct 7 10:09:11 EDT 2009


On Sep 30, 2009, at 4:50 AM, Adam Sjøgren wrote:

> On Tue, 29 Sep 2009 22:54:04 -0500, Chris wrote:
>
>> Not sure, but this could be a case of 'both'. Labels that are quoted
>> and aren't are currently distinguished via a global hash lookup
>> (%FTQUAL_NO_QUOTE) due to the way the parser works; there is some
>> logic behind this, just can't quite recall at the moment why it is
>> this way.
>
> Yes, I saw that there is a number of qualifiers that aren't quoted
> automatically.
>
> The very easy "fix" for me would be to simply remove "label" from
> %FTQUAL_NO_QUOTE, but I'm not really sure what the reason for not
> quoting all values is, so I was hesitant to just propose that.

It's basically for more control over format IIRC.  It appears to only  
play a role in output (via write_seq).

>> You could set a hash key for the label in cases where it isn't  
>> quoted,
>> that should work. You can also test out the Bio::SeqIO::embldriver
>> version (-format => 'embldriver').
>
> Ah, embldriver reads the wrapped qualifier when it isn't quoted  
> without
> problem. Nice! I hadn't noticed embldriver.
>
> I wonder which one is correct in this case?
>
> And should I switch to using embldriver to read, or does it make sense
> to try and concoct a patch that changes embl?

Bio::SeqIO::embldriver is an attempt to coalesce the parsers into a  
generic driver/parser-handler framework; the various parsers (the  
drivers) would parse data into simple chunks, basically hash refs of  
data.  These would be passed on to the handler object, which has  
methods designed to handle the chunks passed in.  Basically it's like  
a souped-up XML parser, but the data is grouped together in a related,  
meaningful way (like an entire seqfeature, for instance).

The main job of the driver is simply to parse the incoming data stream  
into chunks of naturally related data (think XML, but larger chunks of  
data, like an entire seqfeature) and pass it on to the handler object.

For the moment they're still experimental, but I put them out with the  
release so they can be tested.  The current problem with them at the  
moment is there is no specification on how a data chunk is defined and  
labeled, but I am thinking of using something like JSON for that.

>  Thanks for the feedback!
>
>     Adam
>
> -- 
>                                                          Adam Sjøgren
>                                                    adsj at novozymes.com

np.

chris





More information about the Bioperl-l mailing list