[Biojava-dev] bioperl like blastparser

Chris Fields cjfields at uiuc.edu
Wed Dec 26 14:59:46 UTC 2007


It looks like someone got around to it already (biojava/biojava-live  
is new, permissions look set).  Is everything working?

chris

On Dec 26, 2007, at 12:32 AM, Jason Stajich wrote:

> You just need to put the repositor(ies) in
> /home/svn-repositories/biojava
>
> anyone in the biojava group can write there.
> you'll want to delete the existing biojava-live that is in there.
>
> I'm traveling most of 26th and will be on vacation most of the week,  
> but will check in when I have a chance.
>
> -jason
>
> On Dec 25, 2007, at 3:42 PM, Andreas Prlic wrote:
>
>> Hi Mark,
>>
>> Unfortunately the biojava svn respository is not ready yet.
>>
>> George has converted our CVS to an initial svn dump, which I tested  
>> and fixed some details.
>> This dump has been ready since dezember 17th. - ( see dev.open- 
>> bio.org:~andreas/biojava-final.svndump.bz2 )
>> The next step is to load this into the public open-bio repository,  
>> after which (and some more testing)  the new biojava repository  
>> would be ready for new commits.
>>
>> At the present I am waiting for somebody who has admin rights on  
>> the open-bio servers to do these final steps.
>> (or to delegate and give permissions to somebody else).
>>
>> I tried to contact support at open-bio, root-l, as well as mailing  
>> several people directly,
>> but so far I did not get a response.  could be that the holiday  
>> season is slowing response times down...
>>
>> Andreas
>>
>>
>>
>> On 25 Dec 2007, at 21:44, Mark Schreiber wrote:
>>
>>> Hi -
>>>
>>> When will the subversion system be ready for checkin?
>>>
>>> - Mark
>>>
>>> On Dec 24, 2007 4:29 PM, Michael Gang <michaelgang at gmail.com> wrote:
>>>> OK,
>>>> I made four changes,
>>>> in the package  org.biojava.bio.program.sax; at class  
>>>> BlastSaxParser
>>>> 1)  at line 86 i added the variable
>>>> private String                                            
>>>> oQueryLength;
>>>> 2) at the method private void interpret(String poLine) throws  
>>>> SAXException
>>>> in the if "if (iState == IN_HEADER) {"
>>>> at line 209 i added
>>>>
>>>> if (poLine.startsWith("(", 9) && poLine.endsWith("letters)") ) {
>>>>                StringTokenizer st = new StringTokenizer(poLine);
>>>>                oQueryLength = st.nextToken().substring(1);
>>>>           }
>>>> 3)at the function private void emitHeaderIds() throws  
>>>> SAXException {
>>>> at line 564 i added
>>>> oAttQName.setQName("queryLength");
>>>>       oAtts.addAttribute(oAttQName.getURI(),
>>>>                          oAttQName.getLocalName(),
>>>>                          oAttQName.getQName(),
>>>>                          "CDATA", oQueryLength);
>>>>
>>>> at the package  org.biojava.bio.program.ssbind; in  
>>>> HeaderStAXHandler.java
>>>> 4)at the private class QueryIDStAXHandler at line 95 I changed the
>>>> method startelement
>>>>
>>>>       public void startElement(String            uri,
>>>>                                String            localName,
>>>>                                String            qName,
>>>>                                Attributes        attr,
>>>>                                DelegationManager dm)
>>>>       throws SAXException
>>>>       {
>>>>            
>>>> ssContext 
>>>> .getSearchContentHandler().setQueryID(attr.getValue("id"));
>>>>           if (attr.getValue("queryLength") != null)
>>>>           {
>>>>                
>>>> ssContext 
>>>> .getSearchContentHandler().addSearchProperty("queryLength",
>>>> attr.getValue("queryLength"));
>>>>           }
>>>>       }
>>>>   }
>>>>
>>>> Now query length is a property of the annotation  of a blast  
>>>> result.
>>>> It is really fun to participate in the biojava project.
>>>>
>>>> Best regards,
>>>> Michael
>>>>
>>>>
>>>> On Dec 24, 2007 2:32 AM, Mark Schreiber  
>>>> <markjschreiber at gmail.com> wrote:
>>>>> Hi -
>>>>>
>>>>> We are currently merging the code base into subversion (from CVS)
>>>>> after this it will be possible to check in code again.  For small
>>>>> additions it is usually easier to post the code to the dev list  
>>>>> (in
>>>>> the body of the email as the list doesn't like attachments) or  
>>>>> send it
>>>>> to one of the regular committers and get them to add it.
>>>>>
>>>>> The JUnit tests are the standard test package. If you have added  
>>>>> new
>>>>> functionality it would be a good idea to add another test method  
>>>>> in
>>>>> the appropriate JUnit test to make sure it works (and continues to
>>>>> work in the future).
>>>>>
>>>>> - Mark
>>>>>
>>>>>
>>>>> On Dec 23, 2007 11:22 PM, Michael Gang <michaelgang at gmail.com>  
>>>>> wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> I've now added the extraction of the query length.
>>>>>> Can someone explain me the procedure of checking in code to  
>>>>>> biojava ?
>>>>>> I ran the unit tests in the biojava distribution? Are there  
>>>>>> additional
>>>>>> tests available ?
>>>>>>
>>>>>> Best regards,
>>>>>> Michael
>>>>>>
>>>>>>
>>>>>> On Dec 21, 2007 9:59 AM, Mark Schreiber  
>>>>>> <markjschreiber at gmail.com> wrote:
>>>>>>> Hi -
>>>>>>>
>>>>>>> It is not required that you turn all Blast results into objects,
>>>>>>> because it is an event based parser you can do what you want  
>>>>>>> with the
>>>>>>> events including turning them into objects or echoing them to  
>>>>>>> STDOUT.
>>>>>>> Take a look at the examples in the cookbook.
>>>>>>>
>>>>>>> It may be that the query length is actually parsed but is not  
>>>>>>> passed
>>>>>>> onto the object model by the event listeners.
>>>>>>>
>>>>>>> - Mark
>>>>>>>
>>>>>>>
>>>>>>> On Dec 21, 2007 12:15 AM, Andreas Prlic <ap3 at sanger.ac.uk>  
>>>>>>> wrote:
>>>>>>>> Hi Michael,
>>>>>>>>
>>>>>>>> The blast parser (BlastLikeSaxParser) in BioJava has been  
>>>>>>>> around for
>>>>>>>> a while and is frequently being used to parse a variety
>>>>>>>> of different blast outputs. Still it is not complete and can  
>>>>>>>> not
>>>>>>>> parse PSI blast. We have had a number of request about it  
>>>>>>>> lately
>>>>>>>> so I suppose it needs a little maintenance now.
>>>>>>>>
>>>>>>>> To write a new blast parser from scratch will involve a  
>>>>>>>> significant
>>>>>>>> amount of time. It will take time to fix all the bugs, add  
>>>>>>>> support
>>>>>>>> for the different blast versions and write documentation.  
>>>>>>>> Much of
>>>>>>>> this is already available in BioJava, so I would prefer if  
>>>>>>>> you could
>>>>>>>> submit patches for
>>>>>>>> the current blast parser.  Would you also be interested to
>>>>>>>> collaborate in this direction?
>>>>>>>> Another feature that would be nice to add support for is the
>>>>>>>> possibility to send off blast searches to webservices...
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Andreas
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 20 Dec 2007, at 12:54, Michael Gang wrote:
>>>>>>>>
>>>>>>>>> Hi All,
>>>>>>>>>
>>>>>>>>> I used the interface of the java blast parser.
>>>>>>>>> I had mainly two problems with it:
>>>>>>>>> 1) The blast parser does not parse all the information (for  
>>>>>>>>> example
>>>>>>>>> query length)
>>>>>>>>> 2) The blast parser parses the whole blast report into a  
>>>>>>>>> list which
>>>>>>>>> eats a lot of memory.
>>>>>>>>>
>>>>>>>>> I would be interested to write and contribute a blast parser  
>>>>>>>>> which
>>>>>>>>> parses all the information of the blast and parses the blast
>>>>>>>>> iteratively.
>>>>>>>>> Something like the following code in bioperl (just in Java).
>>>>>>>>>  use Bio::SearchIO;
>>>>>>>>>    # format can be 'fasta', 'blast'
>>>>>>>>>    my $searchio = new Bio::SearchIO( -format => 'blastxml',
>>>>>>>>>                                      -file   =>  
>>>>>>>>> 'blastout.xml' );
>>>>>>>>>    while ( my $result = $searchio->next_result() ) {
>>>>>>>>>       while( my $hit = $result->next_hit ) {
>>>>>>>>>        # process the Bio::Search::Hit::HitI object
>>>>>>>>>           while( my $hsp = $hit->next_hsp ) {
>>>>>>>>>            # process the Bio::Search::HSP::HSPI object
>>>>>>>>>        }
>>>>>>>>>    }
>>>>>>>>>
>>>>>>>>> Would you be interested in such a contribution ?
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Michael
>>>>>>>>> _______________________________________________
>>>>>>>>> biojava-dev mailing list
>>>>>>>>> biojava-dev at lists.open-bio.org
>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>>>>>
>>>>>>>> -----------------------------------------------------------------------
>>>>>>>>
>>>>>>>> Andreas Prlic      Wellcome Trust Sanger Institute
>>>>>>>>                              Hinxton, Cambridge CB10 1SA, UK
>>>>>>>>                              +44 (0) 1223 49 6891
>>>>>>>>
>>>>>>>> -----------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> The Wellcome Trust Sanger Institute is operated by Genome  
>>>>>>>> Research
>>>>>>>> Limited, a charity registered in England with number 1021457  
>>>>>>>> and a
>>>>>>>> company registered in England with number 2742969, whose  
>>>>>>>> registered
>>>>>>>> office is 215 Euston Road, London, NW1 2BE.
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> biojava-dev mailing list
>>>>>>>> biojava-dev at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> biojava-dev mailing list
>>>>>> biojava-dev at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>>>
>>>>>
>>>> _______________________________________________
>>>> biojava-dev mailing list
>>>> biojava-dev at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>> -----------------------------------------------------------------------
>>
>> Andreas Prlic      Wellcome Trust Sanger Institute
>>                              Hinxton, Cambridge CB10 1SA, UK
>>                              +44 (0) 1223 49 6891
>>
>> -----------------------------------------------------------------------
>>
>>
>>
>>
>> -- 
>> The Wellcome Trust Sanger Institute is operated by Genome  
>> ResearchLimited, a charity registered in England with number  
>> 1021457 and acompany registered in England with number 2742969,  
>> whose registeredoffice is 215 Euston Road, London, NW1 2BE.
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign






More information about the biojava-dev mailing list