[Biojava-dev] bioperl like blastparser

Andreas Prlic ap3 at sanger.ac.uk
Tue Dec 25 23:42:39 UTC 2007


Hi Mark,

Unfortunately the biojava svn respository is not ready yet.

George has converted our CVS to an initial svn dump, which I tested  
and fixed some details.
This dump has been ready since dezember 17th. - ( see dev.open- 
bio.org:~andreas/biojava-final.svndump.bz2 )
The next step is to load this into the public open-bio repository,  
after which (and some more testing)  the new biojava repository would  
be ready for new commits.

At the present I am waiting for somebody who has admin rights on the  
open-bio servers to do these final steps.
(or to delegate and give permissions to somebody else).

I tried to contact support at open-bio, root-l, as well as mailing  
several people directly,
but so far I did not get a response.  could be that the holiday  
season is slowing response times down...

Andreas



On 25 Dec 2007, at 21:44, Mark Schreiber wrote:

> Hi -
>
> When will the subversion system be ready for checkin?
>
> - Mark
>
> On Dec 24, 2007 4:29 PM, Michael Gang <michaelgang at gmail.com> wrote:
>> OK,
>> I made four changes,
>> in the package  org.biojava.bio.program.sax; at class BlastSaxParser
>> 1)  at line 86 i added the variable
>> private String                                            
>> oQueryLength;
>> 2) at the method private void interpret(String poLine) throws  
>> SAXException
>> in the if "if (iState == IN_HEADER) {"
>> at line 209 i added
>>
>> if (poLine.startsWith("(", 9) && poLine.endsWith("letters)") ) {
>>                 StringTokenizer st = new StringTokenizer(poLine);
>>                 oQueryLength = st.nextToken().substring(1);
>>            }
>> 3)at the function private void emitHeaderIds() throws SAXException {
>> at line 564 i added
>>  oAttQName.setQName("queryLength");
>>        oAtts.addAttribute(oAttQName.getURI(),
>>                           oAttQName.getLocalName(),
>>                           oAttQName.getQName(),
>>                           "CDATA", oQueryLength);
>>
>>  at the package  org.biojava.bio.program.ssbind; in  
>> HeaderStAXHandler.java
>> 4)at the private class QueryIDStAXHandler at line 95 I changed the
>> method startelement
>>
>>        public void startElement(String            uri,
>>                                 String            localName,
>>                                 String            qName,
>>                                 Attributes        attr,
>>                                 DelegationManager dm)
>>        throws SAXException
>>        {
>>            ssContext.getSearchContentHandler().setQueryID 
>> (attr.getValue("id"));
>>            if (attr.getValue("queryLength") != null)
>>            {
>>                ssContext.getSearchContentHandler 
>> ().addSearchProperty("queryLength",
>> attr.getValue("queryLength"));
>>            }
>>        }
>>    }
>>
>> Now query length is a property of the annotation  of a blast result.
>> It is really fun to participate in the biojava project.
>>
>> Best regards,
>> Michael
>>
>>
>> On Dec 24, 2007 2:32 AM, Mark Schreiber <markjschreiber at gmail.com>  
>> wrote:
>>> Hi -
>>>
>>> We are currently merging the code base into subversion (from CVS)
>>> after this it will be possible to check in code again.  For small
>>> additions it is usually easier to post the code to the dev list (in
>>> the body of the email as the list doesn't like attachments) or  
>>> send it
>>> to one of the regular committers and get them to add it.
>>>
>>> The JUnit tests are the standard test package. If you have added new
>>> functionality it would be a good idea to add another test method in
>>> the appropriate JUnit test to make sure it works (and continues to
>>> work in the future).
>>>
>>> - Mark
>>>
>>>
>>> On Dec 23, 2007 11:22 PM, Michael Gang <michaelgang at gmail.com>  
>>> wrote:
>>>> Hi all,
>>>>
>>>> I've now added the extraction of the query length.
>>>> Can someone explain me the procedure of checking in code to  
>>>> biojava ?
>>>> I ran the unit tests in the biojava distribution? Are there  
>>>> additional
>>>> tests available ?
>>>>
>>>> Best regards,
>>>> Michael
>>>>
>>>>
>>>> On Dec 21, 2007 9:59 AM, Mark Schreiber  
>>>> <markjschreiber at gmail.com> wrote:
>>>>> Hi -
>>>>>
>>>>> It is not required that you turn all Blast results into objects,
>>>>> because it is an event based parser you can do what you want  
>>>>> with the
>>>>> events including turning them into objects or echoing them to  
>>>>> STDOUT.
>>>>> Take a look at the examples in the cookbook.
>>>>>
>>>>> It may be that the query length is actually parsed but is not  
>>>>> passed
>>>>> onto the object model by the event listeners.
>>>>>
>>>>> - Mark
>>>>>
>>>>>
>>>>> On Dec 21, 2007 12:15 AM, Andreas Prlic <ap3 at sanger.ac.uk> wrote:
>>>>>> Hi Michael,
>>>>>>
>>>>>> The blast parser (BlastLikeSaxParser) in BioJava has been  
>>>>>> around for
>>>>>> a while and is frequently being used to parse a variety
>>>>>> of different blast outputs. Still it is not complete and can not
>>>>>> parse PSI blast. We have had a number of request about it lately
>>>>>> so I suppose it needs a little maintenance now.
>>>>>>
>>>>>> To write a new blast parser from scratch will involve a  
>>>>>> significant
>>>>>> amount of time. It will take time to fix all the bugs, add  
>>>>>> support
>>>>>> for the different blast versions and write documentation. Much of
>>>>>> this is already available in BioJava, so I would prefer if you  
>>>>>> could
>>>>>> submit patches for
>>>>>> the current blast parser.  Would you also be interested to
>>>>>> collaborate in this direction?
>>>>>> Another feature that would be nice to add support for is the
>>>>>> possibility to send off blast searches to webservices...
>>>>>>
>>>>>> Cheers,
>>>>>> Andreas
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 20 Dec 2007, at 12:54, Michael Gang wrote:
>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> I used the interface of the java blast parser.
>>>>>>> I had mainly two problems with it:
>>>>>>> 1) The blast parser does not parse all the information (for  
>>>>>>> example
>>>>>>> query length)
>>>>>>> 2) The blast parser parses the whole blast report into a list  
>>>>>>> which
>>>>>>> eats a lot of memory.
>>>>>>>
>>>>>>> I would be interested to write and contribute a blast parser  
>>>>>>> which
>>>>>>> parses all the information of the blast and parses the blast
>>>>>>> iteratively.
>>>>>>> Something like the following code in bioperl (just in Java).
>>>>>>>   use Bio::SearchIO;
>>>>>>>     # format can be 'fasta', 'blast'
>>>>>>>     my $searchio = new Bio::SearchIO( -format => 'blastxml',
>>>>>>>                                       -file   =>  
>>>>>>> 'blastout.xml' );
>>>>>>>     while ( my $result = $searchio->next_result() ) {
>>>>>>>        while( my $hit = $result->next_hit ) {
>>>>>>>         # process the Bio::Search::Hit::HitI object
>>>>>>>            while( my $hsp = $hit->next_hsp ) {
>>>>>>>             # process the Bio::Search::HSP::HSPI object
>>>>>>>         }
>>>>>>>     }
>>>>>>>
>>>>>>> Would you be interested in such a contribution ?
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Michael
>>>>>>> _______________________________________________
>>>>>>> biojava-dev mailing list
>>>>>>> biojava-dev at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>>>
>>>>>> ----------------------------------------------------------------- 
>>>>>> ------
>>>>>>
>>>>>> Andreas Prlic      Wellcome Trust Sanger Institute
>>>>>>                               Hinxton, Cambridge CB10 1SA, UK
>>>>>>                               +44 (0) 1223 49 6891
>>>>>>
>>>>>> ----------------------------------------------------------------- 
>>>>>> ------
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>  The Wellcome Trust Sanger Institute is operated by Genome  
>>>>>> Research
>>>>>>  Limited, a charity registered in England with number 1021457  
>>>>>> and a
>>>>>>  company registered in England with number 2742969, whose  
>>>>>> registered
>>>>>>  office is 215 Euston Road, London, NW1 2BE.
>>>>>>
>>>>>> _______________________________________________
>>>>>> biojava-dev mailing list
>>>>>> biojava-dev at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>>>
>>>>>
>>>> _______________________________________________
>>>> biojava-dev mailing list
>>>> biojava-dev at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>
>>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev

-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
                               +44 (0) 1223 49 6891

-----------------------------------------------------------------------




-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 



More information about the biojava-dev mailing list