[Bioperl-l] ASN.1 and BioPerl ?

Hilmar Lapp hlapp at gmx.net
Sat Feb 12 16:20:30 EST 2005


The ASN.1	 parser would be very useful, in particular for implementing 
the NCBI Gene parser I suppose.

I do suggest though that you publish this as a separate module on CPAN, 
as supposedly it is (or meant to be?) generically useful, so I 
completely agree with Chris on this.

In the bioinf world, ASN.1 is really only used by NCBI, but for them 
it's their bread and butter. I concur with Chris that it would be most 
useful and preferred if NCBI just exported their data to XML format to 
make everyone's life a bit easier, and in fact they have done so on 
certain occasions in the more recent past.

For instance, they have a tool to download that's statically linked 
against the NCBI toolkit and that will convert a sequence record in 
ASN.1 to XML format. You can also find claims in their documentation 
that they can translate every ASN.1 definition in their data model into 
XML.

In practice this doesn't appear quite so easy, or otherwise I don't 
understand why there is still neither an XML download nor a similar 
ASN.1->XML converter for NCBI Gene, despite the former having been 
promised in the transition FAQs since several months (NCBI Gene is 
replacing NCBI LocusLink).

I need an NCBI Gene parser implemented in the Bio::SeqIO framework 
returning compatible Bio::SeqI objects within the next few weeks. The 
speed needs to be at least several records per second, ideally 10/s or 
higher.

My understanding is that Peter has a grammar-based parser in Java 
(speed I don't know), and Steve has a Parse::RecDescent-based parser in 
perl (not bioperl) which is (expectedly) slow.

I've seen Graham Barr's module on CPAN but haven't tried it yet; it 
seemed to me that you need the ASN model definition to start with, 
which I haven't seen at any obvious or not-so-obvious place on the NCBI 
ftp site, so I either missed something or you have to download the 
entire toolkit or something else.

How does your library compare to Graham's? Would you be willing to send 
me the code?

	-hilmar

On Thursday, February 10, 2005, at 03:28  PM, Pierre Rioux wrote:

> Hello BioPerl users and coders,
>
> I haven't seen much (or any) support for parsing ASN.1
> documents in BioPerl; I assume it's either because the
> community doesn't need it much, or there are already some
> Perl ASN.1 libraries available elsewhere (but I've not
> looked very hard, I have to admit). There's some code I
> could contribute about that, but before I do anything
> I'd like to know what other developers think. The
> code I have is basically a pure Perl text parser that
> can read ASN.1 text (in .prt format, like NCBI publishes)
> and builds a data structure (a hierarchy of hash tables)
> to represent it. The object model is quite primitive right
> now, but eventually it could be extended into a nice API.
> My questions are:
>
>     1) Would this be useful to anyone?
>     2) If so, where (roughly) in the BioPerl code
>        tree would such a thing go?
>     3) Any other comments or recommendations welcome.
>
> I am using this library in my applications to read the file
> 'gc.prt' as published regularly by NCBI; this file describes
> a set of genetic codes, and NCBI updates it from time to time.
> In BioPerl, I noticed that the same information is hardcoded
> in the class Bio::Tootls::CodonTable, which means that whenever
> NCBI updates the gc.prt file the bioperl class also needs
> to be modified. An alternative I suggest is that if gc.prt
> can be found on the local system (using the NCBI environment
> variable) then Bio::Tootls::CodonTable could read it and keep up
> to date automatically (otherwise, it could keep using the
> hardcoded codon tables it currently has).
>
> I am sure other parts of BioPerl could probably benefit
> a little from being able to read NCBI's ASN.1 data files,
> unless there's some kind of philosophical opposition to
> the idea?
>
> I'm willing to do all the work, AND design a proper and
> clean OO interface for the code, provided the answer to
> question #1 above is 'yes'...
>
> Pierre Rioux
>
> pierre_rioux {round symbol} yahoo {smallest char} com
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------




More information about the Bioperl-l mailing list