[Bioperl-l] Bio::Seq -> Solr (Lucene) ?

Chris Fields cjfields at uiuc.edu
Thu Aug 30 02:01:45 UTC 2007


On Aug 29, 2007, at 5:11 PM, Jay Hannah wrote:

> Please slap me if I'm hysterical.
>
> I'm seeking a broad bioinformatics search engine platform. I want to
> take gobs of data in gobs of formats and allow people to search it on
> the web.
>
> - Entrez is awesome. Unfortunately I don't see anything in the NCBI
> toolkit that helps me run my own version of it. Even a tiny one. After
> an initial "check out our toolkit" response from NCBI I don't seem  
> to be
> getting anywhere. Maybe I'm not communicating enough or well enough.

No.  I have had non-responses before from NCBI; they may just be too  
busy.  Warnock probably applies.

> - EB-eye Search is slick. I don't see any developer kit or source code
> of any kind and I've gotten no response to my emails to them.

Not sure of this one personally.

> - LuceGene is very cool.
> ...
> I don't know Java.

...but you could write a (perl) wrapper around it.  You can try  
contacting Don Gilbert about it, though I think he's been trying out  
Chado.

> - Solr is really neat. It's easy to install and gives a simple/ 
> powerful
> XML API to populate a Lucene index.
> ... so ...
>
> I'm thinking BioPerl knows how to parse lots of formats into a  
> Bio::Seq.
>
> ...
>
> I'm thinking that would be really cool and I'm going to write it.
>
> Now's your chance to slap me.

No need.

> Since I haven't started yet, what would I call this thing?
> Bio::SeqIO::Solr?  (and I wouldn't implement the I part?)
>
> Thanks,
>
> Jay Hannah
> http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah
>
> More notes:
> http://clab.ist.unomaha.edu/CLAB/index.php/RT11

The way I would go about it is use an established XML schema as a  
starting point and implement a writer (if bioperl doesn't already  
support it).  It's better than reinventing (a constantly reinvented)  
wheel and starting up a brand-new schema of your own.  INSDSeq  
(http://www.insdc.org/page.php?page=xmlstatus) is one I've been  
wanting to add for a while but haven't had time to work on; there are  
several other examples.  Note that a few of the currently supported  
ones in bioperl, such as bsml and game, have had very little to no  
development over the years in favor of newer (better?) XML flavors,  
so it likely isn't worth working with those.

chris




More information about the Bioperl-l mailing list