[Bioperl-l] About to tag the last RC...

Kristine Briedis KBriedis at accelrys.com
Tue Jan 20 23:04:01 UTC 2009


Hi Chris,

My name is Kristine Briedis and I recently joined Scott's group at Accelrys.  I just wanted to quickly follow-up on Scott's email.  We traced the minor differences in BLAST scores back to revision 11578 of Bio::SearchIO::blast.pm (bugzilla #1986).  We will update our regression baselines to correspond with this bugfix.

We also discovered that the tag type differences in the XML representation of our Pipeline Pilot records were related to two different revisions.  We saw a change from "doublevalue" to "stringvalue" for the e-value after the aforementioned blast.pm revision 11578, and a change from "integervalue" to "doublevalue" for hsp hit gaps and hsp query gaps after revision 15014 of Bio::Search::HSP::GenericHSP.  Again, we don't think this is a problem-we'll just update our baselines.

Thanks again for the quick response and bugfixes.  Your work is very much appreciated!

Cheers,
Kristine


-----Original Message-----
From: Scott Markel 
Sent: Thursday, January 15, 2009 6:15 AM
To: Chris Fields
Cc: BioPerl List; Kristine Briedis
Subject: RE: [Bioperl-l] About to tag the last RC...

Chris,

Thank you for the quick reply, the changes, and for all of your work
to get 1.6 ready.

Scott


> -----Original Message-----
> From: Chris Fields [mailto:cjfields at illinois.edu]
> Sent: Wednesday, 14 January 2009 9:18 PM
> To: Scott Markel
> Cc: BioPerl List; Kristine Briedis
> Subject: Re: [Bioperl-l] About to tag the last RC...
> 
> On Jan 14, 2009, at 5:45 PM, Scott Markel wrote:
> 
> > Chris,
> >
> > We've been testing 1.6 RC2 with our set of nightly Pipeline Pilot
> > regressions and have noticed a few issues.  Sorry we couldn't get
> > this feedback to you sooner.
> >
> > 1) There is a problem with the output filename for bl2seq on
> > Windows.  In response to bug 2707, quotemeta was used when building
> > the parameter string at line 507 in
> > Bio::Tools::Run::StandAloneNCBIBlast (1.5.9_2).  This causes a
> > problem with the path to the output file on Windows.  For example,
> > "C:\DOCUME~1\outfile" becomes "C\:\\DOCUME\~1\\outfile".  bl2seq
> > can't open the output file and fails.
> 
> I've added an OS check for that so this isn't used with Windows (I
> wondered whether quotemeta would bite me there).  I'm seriously
> considering ripping out that code altogether, though.  I'm not sure we
> want to wade into attempting to accurately escape shell chars simply
> based on OS differences.
> 
> > 2) Parsing megablast output (format 2) with Bio::SearchIO::blast.pm
> > now returns an algorithm name of "BLASTN" instead of "MEGABLAST".
> > This change seems to have been introduced in revision 11579 of
> > blast.pm when a couple regex changes were made (lines 452 and 1201
> > of blast.pm in 1.5.9_2).  Subbing in the old regular expression for
> > megablast in line 452 returned the correct "MEGABLAST" algorithm name.
> 
> I worked out why that regex isn't working (it doesn't match MEGABLAST
> at all).  I fixed it and added a test for checking the algorithm to
> the test suite for MEGABLAST output, seems to work now.
> 
> > We also see some minor differences that we can live with, e.g.,
> > BLAST hit scores changing from 40 to 40.1 and e-values having
> > trailing zeros.  We'll just update our baselines.
> 
> Okay, but let me know if that becomes pressing.  The e-value issue is
> a bit odd and may be worth looking into.
> 
> > The change to using Bio::Annotation::TagTree for SwissProt sequence
> > gene names broke a number of our tests but we'll fix that by
> > modifying the adapters we use between our internal representation
> > and BioPerl's.
> 
> That would be from the switchover from StructureValue (which wasn't
> really designed for the purposes of storing such data).  A layered
> Bio::Annotation::Collection was the other option (this is almost a
> light version of that).
> 
> > One thing we haven't tracked down yet is a change in tag type, e.g.,
> > b:integervalue to b:stringvalue, in the XML representations of our
> > Pipeline Pilot data records.  We're only seeing this for programs in
> > NCBI's BLAST suite.  At this point we don't know what's changed on
> > the BioPerl side to trigger the change in our code.  We'll continue
> > to investigate this.
> 
> Again, if you find it's on our side let us know.
> 
> > Scott
> >
> > Scott Markel, Ph.D.
> > Principal Bioinformatics Architect  email:  smarkel at accelrys.com
> > Accelrys (SciTegic R&D)             mobile: +1 858 205 3653
> > 10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
> > San Diego, CA 92121                 fax:    +1 858 799 5222
> > USA                                 web:    http://www.accelrys.com
> >
> > http://www.linkedin.com/in/smarkel
> > Board of Directors: International Society for Computational Biology
> > Co-chair: ISCB Publications Committee
> > Associate Editor: PLoS Computational Biology
> > Editorial Board: Briefings in Bioinformatics
> 
> Thanks Scott!  Let us know if you have any other problems.  I've been
> busier than expected but should get RC3 out soon.
> 
> -c





More information about the Bioperl-l mailing list