[Bioperl-l] Re: entrezgene binary ASN

Hilmar Lapp hlapp at gmx.net
Sat Oct 1 21:57:03 EDT 2005


On Oct 1, 2005, at 1:01 PM, Stefan Kirov wrote:

> Hilmar,
> As of now the parser does not seek through the streem, but hopefully 
> it will as soon as I can sit down and do it

What advantage would that have?

Note that not allowing streams first off makes entrezgene different 
from all other formats, and second, together with the gene2xml 
conversion requirement would require you to call it in a different 
manner than all other SeqIO parsers (i.e., just passing a string, w/ or 
w/o trailing pipe, wouldn't suffice; you'd have to do a preprocessing 
step). If seeking in the file can outweigh that with some significant 
advantages, then great, but even then it should be optional if it can 
be within reason.

	-hilmar


>  (by the way it is weird but gene2xml will not parse the gunzipped 
> file, so you should not use gzip -d).
> I don't think you are missing anything as far as I can tell.
> Stefan
>
> Hilmar Lapp wrote:
>
>> I've tried to listen in on the exchange but I'm not sure I understand 
>> what the issue is.
>>
>> I.e., does the parser need to seek in the stream? If yes, then piping 
>> won't do any good if it works at all. If no, then the parser should 
>> be perfectly fine with the filename being output from a pipe, and 
>> possibly accept a file handle in substitution too. In that case, the 
>> caller can pipe the actual input through any commands he/she wishes 
>> by simply passing the piped command(s) (e.g. as in "gzip -d -c 
>> file.asn.gz|gene2xml|").
>>
>> The parser doing this auto-magically isn't necessary and doesn't save 
>> a caller that much. Instead, it exposes the parser to liabilities 
>> like path of gzip, path of gene2xml and similar stuff which may not 
>> be identical on all platforms.
>>
>> What am I missing?
>>
>>     -hilmar
>>
>> On Sep 30, 2005, at 12:55 PM, Michael Seewald wrote:
>>
>>> On 9/30/05, Mingyi Liu <mingyi.liu at gpc-biotech.com> wrote:
>>>
>>>>
>>>> I didn't say indexing would break, but the performance of retrieval
>>>> would be horrible. That's why in most situations there's no need to 
>>>> use
>>>> pipe - after all, any one who needs to use index & ID-based 
>>>> retrieval
>>>> would convert the binary ASN to text file anyway (using a script,
>>>> hopefully).
>>>
>>>
>>>
>>> Absolutely right, seeking would be horrible.
>>>
>>> Best wishes,
>>> Michael
>>>
>>> PS: And thanks for providing this great parser!!
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>
> -- 
> Stefan Kirov, Ph.D.
> University of Tennessee/Oak Ridge National Laboratory
> 5700 bldg, PO BOX 2008 MS6164
> Oak Ridge TN 37831-6164
> USA
> tel +865 576 5120
> fax +865-576-5332
> e-mail: skirov at utk.edu
> sao at ornl.gov
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------




More information about the Bioperl-l mailing list