[Biojava-dev] bioperl like blastparser
Andreas Prlic
ap3 at sanger.ac.uk
Tue Dec 25 23:42:39 UTC 2007
Hi Mark,
Unfortunately the biojava svn respository is not ready yet.
George has converted our CVS to an initial svn dump, which I tested
and fixed some details.
This dump has been ready since dezember 17th. - ( see dev.open-
bio.org:~andreas/biojava-final.svndump.bz2 )
The next step is to load this into the public open-bio repository,
after which (and some more testing) the new biojava repository would
be ready for new commits.
At the present I am waiting for somebody who has admin rights on the
open-bio servers to do these final steps.
(or to delegate and give permissions to somebody else).
I tried to contact support at open-bio, root-l, as well as mailing
several people directly,
but so far I did not get a response. could be that the holiday
season is slowing response times down...
Andreas
On 25 Dec 2007, at 21:44, Mark Schreiber wrote:
> Hi -
>
> When will the subversion system be ready for checkin?
>
> - Mark
>
> On Dec 24, 2007 4:29 PM, Michael Gang <michaelgang at gmail.com> wrote:
>> OK,
>> I made four changes,
>> in the package org.biojava.bio.program.sax; at class BlastSaxParser
>> 1) at line 86 i added the variable
>> private String
>> oQueryLength;
>> 2) at the method private void interpret(String poLine) throws
>> SAXException
>> in the if "if (iState == IN_HEADER) {"
>> at line 209 i added
>>
>> if (poLine.startsWith("(", 9) && poLine.endsWith("letters)") ) {
>> StringTokenizer st = new StringTokenizer(poLine);
>> oQueryLength = st.nextToken().substring(1);
>> }
>> 3)at the function private void emitHeaderIds() throws SAXException {
>> at line 564 i added
>> oAttQName.setQName("queryLength");
>> oAtts.addAttribute(oAttQName.getURI(),
>> oAttQName.getLocalName(),
>> oAttQName.getQName(),
>> "CDATA", oQueryLength);
>>
>> at the package org.biojava.bio.program.ssbind; in
>> HeaderStAXHandler.java
>> 4)at the private class QueryIDStAXHandler at line 95 I changed the
>> method startelement
>>
>> public void startElement(String uri,
>> String localName,
>> String qName,
>> Attributes attr,
>> DelegationManager dm)
>> throws SAXException
>> {
>> ssContext.getSearchContentHandler().setQueryID
>> (attr.getValue("id"));
>> if (attr.getValue("queryLength") != null)
>> {
>> ssContext.getSearchContentHandler
>> ().addSearchProperty("queryLength",
>> attr.getValue("queryLength"));
>> }
>> }
>> }
>>
>> Now query length is a property of the annotation of a blast result.
>> It is really fun to participate in the biojava project.
>>
>> Best regards,
>> Michael
>>
>>
>> On Dec 24, 2007 2:32 AM, Mark Schreiber <markjschreiber at gmail.com>
>> wrote:
>>> Hi -
>>>
>>> We are currently merging the code base into subversion (from CVS)
>>> after this it will be possible to check in code again. For small
>>> additions it is usually easier to post the code to the dev list (in
>>> the body of the email as the list doesn't like attachments) or
>>> send it
>>> to one of the regular committers and get them to add it.
>>>
>>> The JUnit tests are the standard test package. If you have added new
>>> functionality it would be a good idea to add another test method in
>>> the appropriate JUnit test to make sure it works (and continues to
>>> work in the future).
>>>
>>> - Mark
>>>
>>>
>>> On Dec 23, 2007 11:22 PM, Michael Gang <michaelgang at gmail.com>
>>> wrote:
>>>> Hi all,
>>>>
>>>> I've now added the extraction of the query length.
>>>> Can someone explain me the procedure of checking in code to
>>>> biojava ?
>>>> I ran the unit tests in the biojava distribution? Are there
>>>> additional
>>>> tests available ?
>>>>
>>>> Best regards,
>>>> Michael
>>>>
>>>>
>>>> On Dec 21, 2007 9:59 AM, Mark Schreiber
>>>> <markjschreiber at gmail.com> wrote:
>>>>> Hi -
>>>>>
>>>>> It is not required that you turn all Blast results into objects,
>>>>> because it is an event based parser you can do what you want
>>>>> with the
>>>>> events including turning them into objects or echoing them to
>>>>> STDOUT.
>>>>> Take a look at the examples in the cookbook.
>>>>>
>>>>> It may be that the query length is actually parsed but is not
>>>>> passed
>>>>> onto the object model by the event listeners.
>>>>>
>>>>> - Mark
>>>>>
>>>>>
>>>>> On Dec 21, 2007 12:15 AM, Andreas Prlic <ap3 at sanger.ac.uk> wrote:
>>>>>> Hi Michael,
>>>>>>
>>>>>> The blast parser (BlastLikeSaxParser) in BioJava has been
>>>>>> around for
>>>>>> a while and is frequently being used to parse a variety
>>>>>> of different blast outputs. Still it is not complete and can not
>>>>>> parse PSI blast. We have had a number of request about it lately
>>>>>> so I suppose it needs a little maintenance now.
>>>>>>
>>>>>> To write a new blast parser from scratch will involve a
>>>>>> significant
>>>>>> amount of time. It will take time to fix all the bugs, add
>>>>>> support
>>>>>> for the different blast versions and write documentation. Much of
>>>>>> this is already available in BioJava, so I would prefer if you
>>>>>> could
>>>>>> submit patches for
>>>>>> the current blast parser. Would you also be interested to
>>>>>> collaborate in this direction?
>>>>>> Another feature that would be nice to add support for is the
>>>>>> possibility to send off blast searches to webservices...
>>>>>>
>>>>>> Cheers,
>>>>>> Andreas
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 20 Dec 2007, at 12:54, Michael Gang wrote:
>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> I used the interface of the java blast parser.
>>>>>>> I had mainly two problems with it:
>>>>>>> 1) The blast parser does not parse all the information (for
>>>>>>> example
>>>>>>> query length)
>>>>>>> 2) The blast parser parses the whole blast report into a list
>>>>>>> which
>>>>>>> eats a lot of memory.
>>>>>>>
>>>>>>> I would be interested to write and contribute a blast parser
>>>>>>> which
>>>>>>> parses all the information of the blast and parses the blast
>>>>>>> iteratively.
>>>>>>> Something like the following code in bioperl (just in Java).
>>>>>>> use Bio::SearchIO;
>>>>>>> # format can be 'fasta', 'blast'
>>>>>>> my $searchio = new Bio::SearchIO( -format => 'blastxml',
>>>>>>> -file =>
>>>>>>> 'blastout.xml' );
>>>>>>> while ( my $result = $searchio->next_result() ) {
>>>>>>> while( my $hit = $result->next_hit ) {
>>>>>>> # process the Bio::Search::Hit::HitI object
>>>>>>> while( my $hsp = $hit->next_hsp ) {
>>>>>>> # process the Bio::Search::HSP::HSPI object
>>>>>>> }
>>>>>>> }
>>>>>>>
>>>>>>> Would you be interested in such a contribution ?
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Michael
>>>>>>> _______________________________________________
>>>>>>> biojava-dev mailing list
>>>>>>> biojava-dev at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>>>
>>>>>> -----------------------------------------------------------------
>>>>>> ------
>>>>>>
>>>>>> Andreas Prlic Wellcome Trust Sanger Institute
>>>>>> Hinxton, Cambridge CB10 1SA, UK
>>>>>> +44 (0) 1223 49 6891
>>>>>>
>>>>>> -----------------------------------------------------------------
>>>>>> ------
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> The Wellcome Trust Sanger Institute is operated by Genome
>>>>>> Research
>>>>>> Limited, a charity registered in England with number 1021457
>>>>>> and a
>>>>>> company registered in England with number 2742969, whose
>>>>>> registered
>>>>>> office is 215 Euston Road, London, NW1 2BE.
>>>>>>
>>>>>> _______________________________________________
>>>>>> biojava-dev mailing list
>>>>>> biojava-dev at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>>>
>>>>>
>>>> _______________________________________________
>>>> biojava-dev mailing list
>>>> biojava-dev at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>
>>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
-----------------------------------------------------------------------
Andreas Prlic Wellcome Trust Sanger Institute
Hinxton, Cambridge CB10 1SA, UK
+44 (0) 1223 49 6891
-----------------------------------------------------------------------
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the biojava-dev
mailing list