[Biojava-dev] bioperl like blastparser

Jason Stajich jason at bioperl.org
Wed Dec 26 06:32:20 UTC 2007


You just need to put the repositor(ies) in
/home/svn-repositories/biojava

anyone in the biojava group can write there.
you'll want to delete the existing biojava-live that is in there.

I'm traveling most of 26th and will be on vacation most of the week,  
but will check in when I have a chance.

-jason

On Dec 25, 2007, at 3:42 PM, Andreas Prlic wrote:

> Hi Mark,
>
> Unfortunately the biojava svn respository is not ready yet.
>
> George has converted our CVS to an initial svn dump, which I tested  
> and fixed some details.
> This dump has been ready since dezember 17th. - ( see dev.open- 
> bio.org:~andreas/biojava-final.svndump.bz2 )
> The next step is to load this into the public open-bio repository,  
> after which (and some more testing)  the new biojava repository  
> would be ready for new commits.
>
> At the present I am waiting for somebody who has admin rights on  
> the open-bio servers to do these final steps.
> (or to delegate and give permissions to somebody else).
>
> I tried to contact support at open-bio, root-l, as well as mailing  
> several people directly,
> but so far I did not get a response.  could be that the holiday  
> season is slowing response times down...
>
> Andreas
>
>
>
> On 25 Dec 2007, at 21:44, Mark Schreiber wrote:
>
>> Hi -
>>
>> When will the subversion system be ready for checkin?
>>
>> - Mark
>>
>> On Dec 24, 2007 4:29 PM, Michael Gang <michaelgang at gmail.com> wrote:
>>> OK,
>>> I made four changes,
>>> in the package  org.biojava.bio.program.sax; at class BlastSaxParser
>>> 1)  at line 86 i added the variable
>>> private String                                            
>>> oQueryLength;
>>> 2) at the method private void interpret(String poLine) throws  
>>> SAXException
>>> in the if "if (iState == IN_HEADER) {"
>>> at line 209 i added
>>>
>>> if (poLine.startsWith("(", 9) && poLine.endsWith("letters)") ) {
>>>                 StringTokenizer st = new StringTokenizer(poLine);
>>>                 oQueryLength = st.nextToken().substring(1);
>>>            }
>>> 3)at the function private void emitHeaderIds() throws SAXException {
>>> at line 564 i added
>>>  oAttQName.setQName("queryLength");
>>>        oAtts.addAttribute(oAttQName.getURI(),
>>>                           oAttQName.getLocalName(),
>>>                           oAttQName.getQName(),
>>>                           "CDATA", oQueryLength);
>>>
>>>  at the package  org.biojava.bio.program.ssbind; in  
>>> HeaderStAXHandler.java
>>> 4)at the private class QueryIDStAXHandler at line 95 I changed the
>>> method startelement
>>>
>>>        public void startElement(String            uri,
>>>                                 String            localName,
>>>                                 String            qName,
>>>                                 Attributes        attr,
>>>                                 DelegationManager dm)
>>>        throws SAXException
>>>        {
>>>            ssContext.getSearchContentHandler().setQueryID 
>>> (attr.getValue("id"));
>>>            if (attr.getValue("queryLength") != null)
>>>            {
>>>                ssContext.getSearchContentHandler 
>>> ().addSearchProperty("queryLength",
>>> attr.getValue("queryLength"));
>>>            }
>>>        }
>>>    }
>>>
>>> Now query length is a property of the annotation  of a blast result.
>>> It is really fun to participate in the biojava project.
>>>
>>> Best regards,
>>> Michael
>>>
>>>
>>> On Dec 24, 2007 2:32 AM, Mark Schreiber  
>>> <markjschreiber at gmail.com> wrote:
>>>> Hi -
>>>>
>>>> We are currently merging the code base into subversion (from CVS)
>>>> after this it will be possible to check in code again.  For small
>>>> additions it is usually easier to post the code to the dev list (in
>>>> the body of the email as the list doesn't like attachments) or  
>>>> send it
>>>> to one of the regular committers and get them to add it.
>>>>
>>>> The JUnit tests are the standard test package. If you have added  
>>>> new
>>>> functionality it would be a good idea to add another test method in
>>>> the appropriate JUnit test to make sure it works (and continues to
>>>> work in the future).
>>>>
>>>> - Mark
>>>>
>>>>
>>>> On Dec 23, 2007 11:22 PM, Michael Gang <michaelgang at gmail.com>  
>>>> wrote:
>>>>> Hi all,
>>>>>
>>>>> I've now added the extraction of the query length.
>>>>> Can someone explain me the procedure of checking in code to  
>>>>> biojava ?
>>>>> I ran the unit tests in the biojava distribution? Are there  
>>>>> additional
>>>>> tests available ?
>>>>>
>>>>> Best regards,
>>>>> Michael
>>>>>
>>>>>
>>>>> On Dec 21, 2007 9:59 AM, Mark Schreiber  
>>>>> <markjschreiber at gmail.com> wrote:
>>>>>> Hi -
>>>>>>
>>>>>> It is not required that you turn all Blast results into objects,
>>>>>> because it is an event based parser you can do what you want  
>>>>>> with the
>>>>>> events including turning them into objects or echoing them to  
>>>>>> STDOUT.
>>>>>> Take a look at the examples in the cookbook.
>>>>>>
>>>>>> It may be that the query length is actually parsed but is not  
>>>>>> passed
>>>>>> onto the object model by the event listeners.
>>>>>>
>>>>>> - Mark
>>>>>>
>>>>>>
>>>>>> On Dec 21, 2007 12:15 AM, Andreas Prlic <ap3 at sanger.ac.uk> wrote:
>>>>>>> Hi Michael,
>>>>>>>
>>>>>>> The blast parser (BlastLikeSaxParser) in BioJava has been  
>>>>>>> around for
>>>>>>> a while and is frequently being used to parse a variety
>>>>>>> of different blast outputs. Still it is not complete and can not
>>>>>>> parse PSI blast. We have had a number of request about it lately
>>>>>>> so I suppose it needs a little maintenance now.
>>>>>>>
>>>>>>> To write a new blast parser from scratch will involve a  
>>>>>>> significant
>>>>>>> amount of time. It will take time to fix all the bugs, add  
>>>>>>> support
>>>>>>> for the different blast versions and write documentation.  
>>>>>>> Much of
>>>>>>> this is already available in BioJava, so I would prefer if  
>>>>>>> you could
>>>>>>> submit patches for
>>>>>>> the current blast parser.  Would you also be interested to
>>>>>>> collaborate in this direction?
>>>>>>> Another feature that would be nice to add support for is the
>>>>>>> possibility to send off blast searches to webservices...
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Andreas
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 20 Dec 2007, at 12:54, Michael Gang wrote:
>>>>>>>
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> I used the interface of the java blast parser.
>>>>>>>> I had mainly two problems with it:
>>>>>>>> 1) The blast parser does not parse all the information (for  
>>>>>>>> example
>>>>>>>> query length)
>>>>>>>> 2) The blast parser parses the whole blast report into a  
>>>>>>>> list which
>>>>>>>> eats a lot of memory.
>>>>>>>>
>>>>>>>> I would be interested to write and contribute a blast parser  
>>>>>>>> which
>>>>>>>> parses all the information of the blast and parses the blast
>>>>>>>> iteratively.
>>>>>>>> Something like the following code in bioperl (just in Java).
>>>>>>>>   use Bio::SearchIO;
>>>>>>>>     # format can be 'fasta', 'blast'
>>>>>>>>     my $searchio = new Bio::SearchIO( -format => 'blastxml',
>>>>>>>>                                       -file   =>  
>>>>>>>> 'blastout.xml' );
>>>>>>>>     while ( my $result = $searchio->next_result() ) {
>>>>>>>>        while( my $hit = $result->next_hit ) {
>>>>>>>>         # process the Bio::Search::Hit::HitI object
>>>>>>>>            while( my $hsp = $hit->next_hsp ) {
>>>>>>>>             # process the Bio::Search::HSP::HSPI object
>>>>>>>>         }
>>>>>>>>     }
>>>>>>>>
>>>>>>>> Would you be interested in such a contribution ?
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Michael
>>>>>>>> _______________________________________________
>>>>>>>> biojava-dev mailing list
>>>>>>>> biojava-dev at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>>>>
>>>>>>> ---------------------------------------------------------------- 
>>>>>>> -------
>>>>>>>
>>>>>>> Andreas Prlic      Wellcome Trust Sanger Institute
>>>>>>>                               Hinxton, Cambridge CB10 1SA, UK
>>>>>>>                               +44 (0) 1223 49 6891
>>>>>>>
>>>>>>> ---------------------------------------------------------------- 
>>>>>>> -------
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>  The Wellcome Trust Sanger Institute is operated by Genome  
>>>>>>> Research
>>>>>>>  Limited, a charity registered in England with number 1021457  
>>>>>>> and a
>>>>>>>  company registered in England with number 2742969, whose  
>>>>>>> registered
>>>>>>>  office is 215 Euston Road, London, NW1 2BE.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> biojava-dev mailing list
>>>>>>> biojava-dev at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> biojava-dev mailing list
>>>>> biojava-dev at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>>
>>>>
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
> ---------------------------------------------------------------------- 
> -
>
> Andreas Prlic      Wellcome Trust Sanger Institute
>                               Hinxton, Cambridge CB10 1SA, UK
>                               +44 (0) 1223 49 6891
>
> ---------------------------------------------------------------------- 
> -
>
>
>
>
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome  
> ResearchLimited, a charity registered in England with number  
> 1021457 and acompany registered in England with number 2742969,  
> whose registeredoffice is 215 Euston Road, London, NW1 2BE.




More information about the biojava-dev mailing list