[Biojava-dev] bioperl like blastparser
Jason Stajich
jason at bioperl.org
Wed Dec 26 06:32:20 UTC 2007
You just need to put the repositor(ies) in
/home/svn-repositories/biojava
anyone in the biojava group can write there.
you'll want to delete the existing biojava-live that is in there.
I'm traveling most of 26th and will be on vacation most of the week,
but will check in when I have a chance.
-jason
On Dec 25, 2007, at 3:42 PM, Andreas Prlic wrote:
> Hi Mark,
>
> Unfortunately the biojava svn respository is not ready yet.
>
> George has converted our CVS to an initial svn dump, which I tested
> and fixed some details.
> This dump has been ready since dezember 17th. - ( see dev.open-
> bio.org:~andreas/biojava-final.svndump.bz2 )
> The next step is to load this into the public open-bio repository,
> after which (and some more testing) the new biojava repository
> would be ready for new commits.
>
> At the present I am waiting for somebody who has admin rights on
> the open-bio servers to do these final steps.
> (or to delegate and give permissions to somebody else).
>
> I tried to contact support at open-bio, root-l, as well as mailing
> several people directly,
> but so far I did not get a response. could be that the holiday
> season is slowing response times down...
>
> Andreas
>
>
>
> On 25 Dec 2007, at 21:44, Mark Schreiber wrote:
>
>> Hi -
>>
>> When will the subversion system be ready for checkin?
>>
>> - Mark
>>
>> On Dec 24, 2007 4:29 PM, Michael Gang <michaelgang at gmail.com> wrote:
>>> OK,
>>> I made four changes,
>>> in the package org.biojava.bio.program.sax; at class BlastSaxParser
>>> 1) at line 86 i added the variable
>>> private String
>>> oQueryLength;
>>> 2) at the method private void interpret(String poLine) throws
>>> SAXException
>>> in the if "if (iState == IN_HEADER) {"
>>> at line 209 i added
>>>
>>> if (poLine.startsWith("(", 9) && poLine.endsWith("letters)") ) {
>>> StringTokenizer st = new StringTokenizer(poLine);
>>> oQueryLength = st.nextToken().substring(1);
>>> }
>>> 3)at the function private void emitHeaderIds() throws SAXException {
>>> at line 564 i added
>>> oAttQName.setQName("queryLength");
>>> oAtts.addAttribute(oAttQName.getURI(),
>>> oAttQName.getLocalName(),
>>> oAttQName.getQName(),
>>> "CDATA", oQueryLength);
>>>
>>> at the package org.biojava.bio.program.ssbind; in
>>> HeaderStAXHandler.java
>>> 4)at the private class QueryIDStAXHandler at line 95 I changed the
>>> method startelement
>>>
>>> public void startElement(String uri,
>>> String localName,
>>> String qName,
>>> Attributes attr,
>>> DelegationManager dm)
>>> throws SAXException
>>> {
>>> ssContext.getSearchContentHandler().setQueryID
>>> (attr.getValue("id"));
>>> if (attr.getValue("queryLength") != null)
>>> {
>>> ssContext.getSearchContentHandler
>>> ().addSearchProperty("queryLength",
>>> attr.getValue("queryLength"));
>>> }
>>> }
>>> }
>>>
>>> Now query length is a property of the annotation of a blast result.
>>> It is really fun to participate in the biojava project.
>>>
>>> Best regards,
>>> Michael
>>>
>>>
>>> On Dec 24, 2007 2:32 AM, Mark Schreiber
>>> <markjschreiber at gmail.com> wrote:
>>>> Hi -
>>>>
>>>> We are currently merging the code base into subversion (from CVS)
>>>> after this it will be possible to check in code again. For small
>>>> additions it is usually easier to post the code to the dev list (in
>>>> the body of the email as the list doesn't like attachments) or
>>>> send it
>>>> to one of the regular committers and get them to add it.
>>>>
>>>> The JUnit tests are the standard test package. If you have added
>>>> new
>>>> functionality it would be a good idea to add another test method in
>>>> the appropriate JUnit test to make sure it works (and continues to
>>>> work in the future).
>>>>
>>>> - Mark
>>>>
>>>>
>>>> On Dec 23, 2007 11:22 PM, Michael Gang <michaelgang at gmail.com>
>>>> wrote:
>>>>> Hi all,
>>>>>
>>>>> I've now added the extraction of the query length.
>>>>> Can someone explain me the procedure of checking in code to
>>>>> biojava ?
>>>>> I ran the unit tests in the biojava distribution? Are there
>>>>> additional
>>>>> tests available ?
>>>>>
>>>>> Best regards,
>>>>> Michael
>>>>>
>>>>>
>>>>> On Dec 21, 2007 9:59 AM, Mark Schreiber
>>>>> <markjschreiber at gmail.com> wrote:
>>>>>> Hi -
>>>>>>
>>>>>> It is not required that you turn all Blast results into objects,
>>>>>> because it is an event based parser you can do what you want
>>>>>> with the
>>>>>> events including turning them into objects or echoing them to
>>>>>> STDOUT.
>>>>>> Take a look at the examples in the cookbook.
>>>>>>
>>>>>> It may be that the query length is actually parsed but is not
>>>>>> passed
>>>>>> onto the object model by the event listeners.
>>>>>>
>>>>>> - Mark
>>>>>>
>>>>>>
>>>>>> On Dec 21, 2007 12:15 AM, Andreas Prlic <ap3 at sanger.ac.uk> wrote:
>>>>>>> Hi Michael,
>>>>>>>
>>>>>>> The blast parser (BlastLikeSaxParser) in BioJava has been
>>>>>>> around for
>>>>>>> a while and is frequently being used to parse a variety
>>>>>>> of different blast outputs. Still it is not complete and can not
>>>>>>> parse PSI blast. We have had a number of request about it lately
>>>>>>> so I suppose it needs a little maintenance now.
>>>>>>>
>>>>>>> To write a new blast parser from scratch will involve a
>>>>>>> significant
>>>>>>> amount of time. It will take time to fix all the bugs, add
>>>>>>> support
>>>>>>> for the different blast versions and write documentation.
>>>>>>> Much of
>>>>>>> this is already available in BioJava, so I would prefer if
>>>>>>> you could
>>>>>>> submit patches for
>>>>>>> the current blast parser. Would you also be interested to
>>>>>>> collaborate in this direction?
>>>>>>> Another feature that would be nice to add support for is the
>>>>>>> possibility to send off blast searches to webservices...
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Andreas
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 20 Dec 2007, at 12:54, Michael Gang wrote:
>>>>>>>
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> I used the interface of the java blast parser.
>>>>>>>> I had mainly two problems with it:
>>>>>>>> 1) The blast parser does not parse all the information (for
>>>>>>>> example
>>>>>>>> query length)
>>>>>>>> 2) The blast parser parses the whole blast report into a
>>>>>>>> list which
>>>>>>>> eats a lot of memory.
>>>>>>>>
>>>>>>>> I would be interested to write and contribute a blast parser
>>>>>>>> which
>>>>>>>> parses all the information of the blast and parses the blast
>>>>>>>> iteratively.
>>>>>>>> Something like the following code in bioperl (just in Java).
>>>>>>>> use Bio::SearchIO;
>>>>>>>> # format can be 'fasta', 'blast'
>>>>>>>> my $searchio = new Bio::SearchIO( -format => 'blastxml',
>>>>>>>> -file =>
>>>>>>>> 'blastout.xml' );
>>>>>>>> while ( my $result = $searchio->next_result() ) {
>>>>>>>> while( my $hit = $result->next_hit ) {
>>>>>>>> # process the Bio::Search::Hit::HitI object
>>>>>>>> while( my $hsp = $hit->next_hsp ) {
>>>>>>>> # process the Bio::Search::HSP::HSPI object
>>>>>>>> }
>>>>>>>> }
>>>>>>>>
>>>>>>>> Would you be interested in such a contribution ?
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Michael
>>>>>>>> _______________________________________________
>>>>>>>> biojava-dev mailing list
>>>>>>>> biojava-dev at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>>>>
>>>>>>> ----------------------------------------------------------------
>>>>>>> -------
>>>>>>>
>>>>>>> Andreas Prlic Wellcome Trust Sanger Institute
>>>>>>> Hinxton, Cambridge CB10 1SA, UK
>>>>>>> +44 (0) 1223 49 6891
>>>>>>>
>>>>>>> ----------------------------------------------------------------
>>>>>>> -------
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> The Wellcome Trust Sanger Institute is operated by Genome
>>>>>>> Research
>>>>>>> Limited, a charity registered in England with number 1021457
>>>>>>> and a
>>>>>>> company registered in England with number 2742969, whose
>>>>>>> registered
>>>>>>> office is 215 Euston Road, London, NW1 2BE.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> biojava-dev mailing list
>>>>>>> biojava-dev at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> biojava-dev mailing list
>>>>> biojava-dev at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>>
>>>>
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
> ----------------------------------------------------------------------
> -
>
> Andreas Prlic Wellcome Trust Sanger Institute
> Hinxton, Cambridge CB10 1SA, UK
> +44 (0) 1223 49 6891
>
> ----------------------------------------------------------------------
> -
>
>
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome
> ResearchLimited, a charity registered in England with number
> 1021457 and acompany registered in England with number 2742969,
> whose registeredoffice is 215 Euston Road, London, NW1 2BE.
More information about the biojava-dev
mailing list