[Biojava-dev] bioperl like blastparser
Chris Fields
cjfields at uiuc.edu
Wed Dec 26 14:59:46 UTC 2007
It looks like someone got around to it already (biojava/biojava-live
is new, permissions look set). Is everything working?
chris
On Dec 26, 2007, at 12:32 AM, Jason Stajich wrote:
> You just need to put the repositor(ies) in
> /home/svn-repositories/biojava
>
> anyone in the biojava group can write there.
> you'll want to delete the existing biojava-live that is in there.
>
> I'm traveling most of 26th and will be on vacation most of the week,
> but will check in when I have a chance.
>
> -jason
>
> On Dec 25, 2007, at 3:42 PM, Andreas Prlic wrote:
>
>> Hi Mark,
>>
>> Unfortunately the biojava svn respository is not ready yet.
>>
>> George has converted our CVS to an initial svn dump, which I tested
>> and fixed some details.
>> This dump has been ready since dezember 17th. - ( see dev.open-
>> bio.org:~andreas/biojava-final.svndump.bz2 )
>> The next step is to load this into the public open-bio repository,
>> after which (and some more testing) the new biojava repository
>> would be ready for new commits.
>>
>> At the present I am waiting for somebody who has admin rights on
>> the open-bio servers to do these final steps.
>> (or to delegate and give permissions to somebody else).
>>
>> I tried to contact support at open-bio, root-l, as well as mailing
>> several people directly,
>> but so far I did not get a response. could be that the holiday
>> season is slowing response times down...
>>
>> Andreas
>>
>>
>>
>> On 25 Dec 2007, at 21:44, Mark Schreiber wrote:
>>
>>> Hi -
>>>
>>> When will the subversion system be ready for checkin?
>>>
>>> - Mark
>>>
>>> On Dec 24, 2007 4:29 PM, Michael Gang <michaelgang at gmail.com> wrote:
>>>> OK,
>>>> I made four changes,
>>>> in the package org.biojava.bio.program.sax; at class
>>>> BlastSaxParser
>>>> 1) at line 86 i added the variable
>>>> private String
>>>> oQueryLength;
>>>> 2) at the method private void interpret(String poLine) throws
>>>> SAXException
>>>> in the if "if (iState == IN_HEADER) {"
>>>> at line 209 i added
>>>>
>>>> if (poLine.startsWith("(", 9) && poLine.endsWith("letters)") ) {
>>>> StringTokenizer st = new StringTokenizer(poLine);
>>>> oQueryLength = st.nextToken().substring(1);
>>>> }
>>>> 3)at the function private void emitHeaderIds() throws
>>>> SAXException {
>>>> at line 564 i added
>>>> oAttQName.setQName("queryLength");
>>>> oAtts.addAttribute(oAttQName.getURI(),
>>>> oAttQName.getLocalName(),
>>>> oAttQName.getQName(),
>>>> "CDATA", oQueryLength);
>>>>
>>>> at the package org.biojava.bio.program.ssbind; in
>>>> HeaderStAXHandler.java
>>>> 4)at the private class QueryIDStAXHandler at line 95 I changed the
>>>> method startelement
>>>>
>>>> public void startElement(String uri,
>>>> String localName,
>>>> String qName,
>>>> Attributes attr,
>>>> DelegationManager dm)
>>>> throws SAXException
>>>> {
>>>>
>>>> ssContext
>>>> .getSearchContentHandler().setQueryID(attr.getValue("id"));
>>>> if (attr.getValue("queryLength") != null)
>>>> {
>>>>
>>>> ssContext
>>>> .getSearchContentHandler().addSearchProperty("queryLength",
>>>> attr.getValue("queryLength"));
>>>> }
>>>> }
>>>> }
>>>>
>>>> Now query length is a property of the annotation of a blast
>>>> result.
>>>> It is really fun to participate in the biojava project.
>>>>
>>>> Best regards,
>>>> Michael
>>>>
>>>>
>>>> On Dec 24, 2007 2:32 AM, Mark Schreiber
>>>> <markjschreiber at gmail.com> wrote:
>>>>> Hi -
>>>>>
>>>>> We are currently merging the code base into subversion (from CVS)
>>>>> after this it will be possible to check in code again. For small
>>>>> additions it is usually easier to post the code to the dev list
>>>>> (in
>>>>> the body of the email as the list doesn't like attachments) or
>>>>> send it
>>>>> to one of the regular committers and get them to add it.
>>>>>
>>>>> The JUnit tests are the standard test package. If you have added
>>>>> new
>>>>> functionality it would be a good idea to add another test method
>>>>> in
>>>>> the appropriate JUnit test to make sure it works (and continues to
>>>>> work in the future).
>>>>>
>>>>> - Mark
>>>>>
>>>>>
>>>>> On Dec 23, 2007 11:22 PM, Michael Gang <michaelgang at gmail.com>
>>>>> wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> I've now added the extraction of the query length.
>>>>>> Can someone explain me the procedure of checking in code to
>>>>>> biojava ?
>>>>>> I ran the unit tests in the biojava distribution? Are there
>>>>>> additional
>>>>>> tests available ?
>>>>>>
>>>>>> Best regards,
>>>>>> Michael
>>>>>>
>>>>>>
>>>>>> On Dec 21, 2007 9:59 AM, Mark Schreiber
>>>>>> <markjschreiber at gmail.com> wrote:
>>>>>>> Hi -
>>>>>>>
>>>>>>> It is not required that you turn all Blast results into objects,
>>>>>>> because it is an event based parser you can do what you want
>>>>>>> with the
>>>>>>> events including turning them into objects or echoing them to
>>>>>>> STDOUT.
>>>>>>> Take a look at the examples in the cookbook.
>>>>>>>
>>>>>>> It may be that the query length is actually parsed but is not
>>>>>>> passed
>>>>>>> onto the object model by the event listeners.
>>>>>>>
>>>>>>> - Mark
>>>>>>>
>>>>>>>
>>>>>>> On Dec 21, 2007 12:15 AM, Andreas Prlic <ap3 at sanger.ac.uk>
>>>>>>> wrote:
>>>>>>>> Hi Michael,
>>>>>>>>
>>>>>>>> The blast parser (BlastLikeSaxParser) in BioJava has been
>>>>>>>> around for
>>>>>>>> a while and is frequently being used to parse a variety
>>>>>>>> of different blast outputs. Still it is not complete and can
>>>>>>>> not
>>>>>>>> parse PSI blast. We have had a number of request about it
>>>>>>>> lately
>>>>>>>> so I suppose it needs a little maintenance now.
>>>>>>>>
>>>>>>>> To write a new blast parser from scratch will involve a
>>>>>>>> significant
>>>>>>>> amount of time. It will take time to fix all the bugs, add
>>>>>>>> support
>>>>>>>> for the different blast versions and write documentation.
>>>>>>>> Much of
>>>>>>>> this is already available in BioJava, so I would prefer if
>>>>>>>> you could
>>>>>>>> submit patches for
>>>>>>>> the current blast parser. Would you also be interested to
>>>>>>>> collaborate in this direction?
>>>>>>>> Another feature that would be nice to add support for is the
>>>>>>>> possibility to send off blast searches to webservices...
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Andreas
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 20 Dec 2007, at 12:54, Michael Gang wrote:
>>>>>>>>
>>>>>>>>> Hi All,
>>>>>>>>>
>>>>>>>>> I used the interface of the java blast parser.
>>>>>>>>> I had mainly two problems with it:
>>>>>>>>> 1) The blast parser does not parse all the information (for
>>>>>>>>> example
>>>>>>>>> query length)
>>>>>>>>> 2) The blast parser parses the whole blast report into a
>>>>>>>>> list which
>>>>>>>>> eats a lot of memory.
>>>>>>>>>
>>>>>>>>> I would be interested to write and contribute a blast parser
>>>>>>>>> which
>>>>>>>>> parses all the information of the blast and parses the blast
>>>>>>>>> iteratively.
>>>>>>>>> Something like the following code in bioperl (just in Java).
>>>>>>>>> use Bio::SearchIO;
>>>>>>>>> # format can be 'fasta', 'blast'
>>>>>>>>> my $searchio = new Bio::SearchIO( -format => 'blastxml',
>>>>>>>>> -file =>
>>>>>>>>> 'blastout.xml' );
>>>>>>>>> while ( my $result = $searchio->next_result() ) {
>>>>>>>>> while( my $hit = $result->next_hit ) {
>>>>>>>>> # process the Bio::Search::Hit::HitI object
>>>>>>>>> while( my $hsp = $hit->next_hsp ) {
>>>>>>>>> # process the Bio::Search::HSP::HSPI object
>>>>>>>>> }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> Would you be interested in such a contribution ?
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Michael
>>>>>>>>> _______________________________________________
>>>>>>>>> biojava-dev mailing list
>>>>>>>>> biojava-dev at lists.open-bio.org
>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>>>>>
>>>>>>>> -----------------------------------------------------------------------
>>>>>>>>
>>>>>>>> Andreas Prlic Wellcome Trust Sanger Institute
>>>>>>>> Hinxton, Cambridge CB10 1SA, UK
>>>>>>>> +44 (0) 1223 49 6891
>>>>>>>>
>>>>>>>> -----------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> The Wellcome Trust Sanger Institute is operated by Genome
>>>>>>>> Research
>>>>>>>> Limited, a charity registered in England with number 1021457
>>>>>>>> and a
>>>>>>>> company registered in England with number 2742969, whose
>>>>>>>> registered
>>>>>>>> office is 215 Euston Road, London, NW1 2BE.
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> biojava-dev mailing list
>>>>>>>> biojava-dev at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> biojava-dev mailing list
>>>>>> biojava-dev at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>>>
>>>>>
>>>> _______________________________________________
>>>> biojava-dev mailing list
>>>> biojava-dev at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>> -----------------------------------------------------------------------
>>
>> Andreas Prlic Wellcome Trust Sanger Institute
>> Hinxton, Cambridge CB10 1SA, UK
>> +44 (0) 1223 49 6891
>>
>> -----------------------------------------------------------------------
>>
>>
>>
>>
>> --
>> The Wellcome Trust Sanger Institute is operated by Genome
>> ResearchLimited, a charity registered in England with number
>> 1021457 and acompany registered in England with number 2742969,
>> whose registeredoffice is 215 Euston Road, London, NW1 2BE.
>
Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign
More information about the biojava-dev
mailing list